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MOLECULAR SWITCHES AND METHODS FOR MAKING AND 
USING THE SAME 

Field of the Invention 

The invention relates to fusion molecules which function as molecular 
switches and to methods for making and using the same. 

Background of the Invention 

Gene fusion technology, flie fusion of two or rpois genes into a single gene, 
has been widefy used as a tool in protein engineering, localization and purification. 
There are two conceptually different methods of making fusions. The simplest 
method, end-to-end fusions, has been used ahnost exclusively. The second metiiod, 
insertional fusion, conq>rises the insertion of one gene into the middle of another 
gene. Insertions can result in a continuous domain being split into a discontinuous, 
domain. 

One of the first Tq)orts of successful insertion of one protein into another was 
a study by Ehrmann, et al., Proc. Natl Acad. Sd. USA ST. 7574-8, who described the 
insertion of alkaline phosphatase (AP) into the E. coU outer membrane protein MalF, 
as a tool for studying membrane topology. High levels of alkaline phosphatase 
activi^ were obtained in flie fusions despite the feet that alkaline phosphatase requires 
dimerization for activity. Since flien, AP has been successfully inserted into a number 
of integi^ rnembrane proteins (see, e.g., Bibi and Beja, 1 994, J. Biol Chan. 269: 
19910-5; Cosgriff and Pittard, 1997, J. Bacteriol. 179: 3317-23; Lacatena, et al., 
1994, Proc. Natl. Acad. Sci. USA 91: 10521-5; Pi and Pittard, 1996, J Bacteriol. IM 
2650-5; Pigeon and SUver, 1994, Afo/. Microbiol 14: 871.^81). 

Other proteins, including green fluorescent protein GFP (Biondi, et al., 1 998, 

Nucleic Acids Res. 26: 4946-4952; Kratz, et al., 1999, Proc. Natl Acad Sci. USA 96: 

Siegel and Isacoff, 1997, Neuron 19: 735-41; Siegel and Isacofif, 2000, Methods 

Enzymol 327:. 249-59), TEMl B-lactamase (Betton, et al., 1997, Nat. Biotechnology 

15: 1276-1279; Collinet, et al., 2000, y. Biol Cltem. 2Z5: 17428-33; Ehrmann, etal., 
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1990, Proc. Natl. Acad. Set USA 87: 7574-8), thioredoxin (Lu, etal., 1995, 
Biotechnology (N Y) J3: 366-72); dihydrofolate reductase (CoUinet, et al., 2000, J. 
Biol Chan. 275: 17428-33); FKBP12 (Tucker and Fields, Nat. Biotechnol J9: 1042- 
6); estrogen receptor-a (Tucker and Fields, 2000, supra), and p-xylanase (Ay, et al., 
5 1998, Proc. Natl. Acad. Sci. USA 95: 6613-6618); have been successfully inserted 
into oflier proteins. Such fusions at least partially retain the function of the inserted 
protein. 

Doi, et al., 1999, FEBS Letters 453: 305-307, describe a fusion which 
comprises an insertion of the B-lactamase inhibiting protein (BLIP) polypeptide into a 
10 surface loop of the GFP protein. After several rounds of random mutagenesis, 

polypeptides were obtained which exhibited increased fluorescence upon bind of a 
ligand (B-lactauiase) to the BLIP polypeptide. 

More recently, yeast sensors for Hgand binding were constructed by the 
insertion of FKBP12 and the estrogen receptor-ct igand-binding domain into a 

1 5 rationally chosen site in dihydrofolate reductase (DHFR) (see, e.g.. Tucker and Fields, 
2001, Nature Biotechnology J9: 1042-1046). The site of insertion was at residue 107, 
a site previously shown to be one tolerant of bisection (PeUetier, et aL, 1 998, Proc. 
Natl. Acad. Sci. USA 95: 12141-12146). The two fragments of DHFR divided at 107 
were found to be unable to reassemble to form an active enzyme unless the fragments 

20 were fused to domains that dimerized (e.g., such as leucine zippers). Yeast 

expressing the FKBP12-DHFR or ERa-DHFR fusion proteins had an approximate 
two-fold increase in growth rate in the presence of their respective ligands (FK106 
and estrogen) when DHFR activity limited growth. The fusion proteins were either 
fortuitously temperature sensitive (ERa -DHFR) or designed to be so by mufatibn 

25 (FKBP12-DHFR) in order that subtle changes in growth could be detected upon 
addition of the ligand. 

Generally, methods for generating fusion molecules have not provided a 
systematic way to functionally couple protein domains. 
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Summary of the Tnygnfinii . 

The invention provides molecular switches which couple external signals, 
including, but not limited to, the presence, absence or level of molecules, ligands, 
metabolites, ions, and the like, the presence, absence, or level of chemical, optical or 
electrical conditions, to functionality. Preferably, the switches are fusion molecules 
comprising an insertion sequence and an acceptor sequence for receiving the insertion 
sequence, wherein the state of the insertion sequence is coupled to the state of the 
acceptor sequence. For example, the activity of the insertion sequence can be coupled 
to the activity/state of the acceptor sequence. 

The "state" of a molecule can comprise its ability or latent ability to emit or 
absorb light, its ability or latent ability to change conformation, its ability or latent 
ability to bind to a ligand, to catalyze a substrate, transfer electrons, and the like. 
Preferably, molecular switches according to the invention are multistable, i.e., able to 
switch between at least two states. In one aspect, the fusion molecule is bistable, i.e., 
a state is either "ON" or "OFF', for example, able to emit light or not, able to bind or 
not, able to catalyze or not. able to transfer electrons or not, and so forth. In another 
a^ct. the fusion molecule is able to switch between more than two states. For 
example, in response to a particular threshold state exhibited by an insertion sequence 
or acceptor sequence, the respective other sequence of the fusion may exhibit a range 
of states (e,g., a range of binding activity, a range of enzyme catalysis, etc.). Thus, 
rather than switching from "ON" of "OFF', the fusion molecule can exhibit a gra^ 
response to a stimulus. More generally, a molecular switch is one which generates a 
measurable change in state in response to a signal. 

In one aspect, a molecular switch can comprise a plurahty of fusion molecules 
25 responsive to a signal, which mediate a function in response to a change in state of at 
least a portion of the molecule. As above, preferably, this change of state occurs in 
response to a change in state of another portion of the molecule. While the states of 
individual fusion molecules in the population may be ON or OFF, the aggregate 
population of molecules may not be able to mediate the function unless a threshold 
30 number of molecules switch states. Thus, the "state" of the population of molecules 
may be somewhere in between ON or OFF, depending on the number of molecules 
which have switched states. This provides an ability to more precisely tune a 

3 
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molecular response to a signal by selecting for molecules which respond to a range of 
signals and modifying the population of fusion molecules to provide selected numbers 
of fusion molecules which respond to a harrow range or wider range of signal as 
desired. 

5 In yet another aspect, the invention provides a fusion molecule comprising an 

insertion sequence and an acceptor sequence. The insertion sequence or the acceptor 
sequence localizes the fusion molecule intracellularly. Preferably, the fusion 
molecule is associated with a bio-effective molecule and intracellular localization is 
coupled to release of the bio-eflfective molecule from the fusion molecule. 

10 The fusion molecules of the present invention also can comprise an insertion 

sequence and acceptor sequence, wherein either the insertion sequence or the acceptor 
sequence associates with a bio-efifective molecule and disassociates from the bio- 
effective molecule when the respective other sequence of the fusion binds to a cellular 
marker of a pathological condition. In this aspect, the fusion molecule can be used to 

1 5 target bio-effective molecules, such as drugs, to cells having specijBc pathologies 
(e.g., cancer cells). 

In still another aspect, the fusion molecule of tfie present invention is capable 
of switching from a non-toxic state to a toxic state. Either the insertion sequence or 
acceptor sequence may bind to a cellular maricer of a pathology (e.g., such as a tumor 
20 antigen). Binding of the marker to ttie fusion protein switches the fusion protein from 
a toxic to a non-toxic state. 

In a further aspect, the fusion molecule comprises a molecular switch for 
controlling a cellular pathway. The fusion molecule comprises an insertion sequence 
and an acceptor sequence and the states of the insertion sequence and acceptor 
25 sequence are coupled, such that the state of either the inserted sequence or the 
acceptor sequence modulates flie activity or expression of a molecular pathway 
molecule in a cell. The invention can be used to modulate cellular responses usmg 
exogenous or endogenous binding molecules (e g., ligands, small molecules, ions, 
metabolites, and the like) to transduce a desired signal. 

30 In another aspect, the invention provides a fusipn protein comprising an 

insertion sequence and an accqjtor sequence, wherein either the insertion sequence or 

4 
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the acceptor sequence binds to a DNA molecule, and wherein DNA binding activity is 
coupled to the response of the respective other sequence of the fusion molecule to a 
signal. Preferably, flie DNA to which the fusion molecule binds is a nucleic acid 
regulatory sequence for regulating the activity of anoflier nucleic acid molecule (e.g., 
modulating transcription, translation, replication, recombination, supercoiling, etc., of 
the other nucleic acid molecule). 

The invention also provides a sensor molecule comprising an insertion 
sequence and an acceptor sequence, wherein either the insertion sequence or acceptor 
sequence binds to a target molecule and wherein the respective other sequence 
generates a signal in response to binding. Preferably, the acceptor sequence 
comprises a deletion and/or duplication at the insertion site. 

The invention also provides a combinatorial method for generating any of the 
molecular switches described above. Such an approach provides a means to 
systematically examine all or a substantial fraction of allpossible fusions between 
insertion sequences and acceptor sequences, including ones in which deletions and 
tandem duplications occur at the insertion site. Preferably, given an acceptor 
sequence comprising a given number of monomers (e.g., bases, amino acids, etc.), at 
least about the same number of different fusions are generated, and more preferably, 
at least about twice this number of fusions are genemted. 

In one aspect, the method comprises domain insertion, i.e., randomly inserting 
an insertion sequence into an acceptor sequence and selecting for a fusion molecule in 
which the state of flie insertion sequence is coupled to the state of the acceptor 
molecule. In another aspect, however, the method comprises generating first and 
second molecules with dimerization domains and selecting for molecules which 
dimerize in response to a condition, e.g., such as upon binding to a signahng 
molecule. 

The invention also provides a method for assembling a modulatable fusion 
molecule, comprising: randomly inserting an insertion sequence into an acceptor 
sequence, wherein the insertion sequence and the acceptor sequence each comprise a 
state (e.g., such as an activity), thereby generating a fusion molecule, and selecting a 
fusion molecule wherein insertion couples a change in state of the insertion sequence 
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to a change in the state of the acceptor sequence. In one aspect, an activity of tiie 
insertion sequence is modulated, preferably, in response to a change in a state of the 
acceptor sequence. In another aspect, the activity of the acceptor sequence is 
modulated, preferably in response to a change in the state of the insertion sequence. 
5 Insertion of the insertion sequence into the acceptor sequence, in some cases, may 
generate a new state (e.g., a new activity). The process of randomly mserting may 
generate a duplication or deletion at the insertion site, thereby increasing the numbers 
of types of fusions that can be examined. 

The invention also provides a method for assembling a multistable fiision 
10 molecule which can switch between at least an active state and a less active state, an 
in some cases, an inactive state. The method comprises randomly inserting an 
insertion sequence into an acceptor sequence, thereby generating a fusion molecule, 
wherein either the insertion sequence or the acceptor sequence comprises an activity; 
and wherein the respective other sequence is responsive to a signal. A fusion 
1 5 molecule is selected in which activity is coupled to the signal such that flie fusion 

molecule switches state in response to the signal. The signal can comprise binding of 
a Ugand, a change in conformation, a chemical, optical, electrical, magnetic signal, the 
absence of such conditions, and the like. In one aspect, the method comprises 
randomly inserting an insertion sequence responsive to a signal into an acceptor 
20 sequence comprising an activity, thereby generating a fusion molecule, and selecting 
for a fusion molecule wherein flie activity of the acceptor sequence is responsive to 
the signal. 

Preferably, the insertion sequence and acceptor sequence comprise 
polypeptides and in one aspect, the step of randomly inserting the insertion molecule 

25 into flie acceptor molecule comprises obtaining a first nucleic acid fragment encoding 
the insertion polypeptide and a second nucleic acid fragment encoding the acceptor 
polypeptide and randomly inserting the first nucleic acid fragment into the second 
nucleic acid fragment. The method may further comprise the step of digesting the 
second nucleic acid with a nuclease such as DNase. I, SI nuclease, mung bean 

30 nuclease, a restriction endonuclease, or a combination thereof, shearing the second 
nucleic acid (e.g., mechanically), or otherwise treating the second nucleic acid to 
introduce breaks (e.g., exposing the nucleic acid to chemical agents and/or radiation). 

6 
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The nucleic acid sequence encoding the insertion sequence may also be digested, 
sheared, or otherwise treated, to generate random fiagments of the insertion sequence. 
Preferably, such fragments are inserted at random into the sites of breaks in fte 
nucleic acid sequence encoding the acceptor molecule caused by the nuclease 
digestion. 

The step of insertion can be repeated a plurality of times with a plurality of 
first and second nucleic acid molecules, either sequentially or simultaneously, to 
generate a library of acceptor polypeptides comprising randomly inserted insertion 
polypeptide sequences. The library can be used to identify fusion polypeptides 
wherein the states of the insertion polypeptide and acceptor polypeptide are coupled, 
and preferably, responsive to a signal. 

In one aspect, the library comprises members comprising insertions with 
deletions at Ihe insertion site, insertions with tandem dupHcations at the insertion site, 
and insertions with neither duplications nor deletions. 

The invention also provides expression vectors for expression of the fusion 
molecules as well as host cells for expressing Ihe fusion molecules. Host cells can 
include microorganisms, animal cells, and plant cells. In one aspect, fusion molecules 
are expressed in one or more cells of a transgenic organism. Fusion molecules 
according to the invention can thus be used to provide a conditional knockout or 
knock-in of a biomolecule in a cell. 

The invention further provides a method for modulating a cellular activity 
comprising providing any of flie fusion molecules described above, wherein a change 
in state of at least Hxe insertion sequence or the acceptor sequence modulates a cellular 
activity, and wherein the change in state which modulates the cellular activity is 
coupled to a change in state of the respective other portion of fee fusion molecule. 
The cellular activity is modulated by changing the state of the respective other portion 
of the fusion molecule. 

In another aspect, the invention provides a method for delivering a bio- 
effective molecule to a cell. The method comprises providing a fusion molecule 
associated with a bio-effective molecule to tiie cell, the fusion molecule comprising 
an insertion sequence and an acceptor sequence. Preferably, either flie insertion 

7 
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sequence or the acceptor sequence binds to a cellular marker of a pathological 
condition and upon binding to the marker, tfie fusion molecule dissociates from the 
bio-effective molecule, thereby delivering the molecule to the cell. 

In still another aspect, the invention provides a method for delivering a bio- 
5 effective molecule intracellularly. The method comprises providing a fusion 

molecule associated with a bio-effective molecule to the cell, the fusion molecule 
comprising an insertion sequence and an acceptor sequence. Either the insertion 
sequence or acceptor sequence comprises a transport sequence for transporting the 
fusion molecule intracellularly. Preferably, release of the bio-effective molecule from 
10 the fusion molecule is coupled to transport of the fusion molecule intracellularly. 

Preferably, either the inserted sequence or the acceptor sequence is capable of binding 
to a biomolecule and binding of the fiision molecule with the biomolecule transports 
the fusion molecule intracellularly and disassociates the bio-effective molecule from 
the fusion molecule. 

1 5 The invention also provides a method for modulating a molecular pathway in a 

cell. The method comprises providing a fusion molecule to the cell, the fusion 
molecule comprising an insertion sequence and an acceptor sequence. The states of 
the insertion sequence and acceptor sequence are coupled and responsive to a signal, 
and the state of either the insertion sequence or the acceptor sequence modulates the 

20 activity or expression of a molecular pathway molecule in flie cell. Upon exposure of 
the fusion molecule to the si^al, the fusion molecule is thus able to modulate the 
molecular pathway. 

The invention additionally provides a method for controlling the activity of a 
nucleic acid regulatory sequence. The method comprises providing a fusion molecule 

25 which comprises an insertion sequence and an acceptor sequence, wherein eitfier the 
insertion sequence or the acceptor sequence responds to a signal, and wherein the 
respective other sequence of the fusion molecule binds to the nucleic acid regulatory 
sequence when the signal is responded to. Exposing the fusion molecule to the signal 
modulates the activity of the nucleic acid regulatory sequence. Types of activities 

30 regulated include, but are not limited to, modulating transcription, translation, 
replication, recombination, or supercoiling. 

8 
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The invention also provides a method for generating a conditional 
heterodimer, comprising: providing a plurality of randomly bisected molecules; each 
bisected molecule comprising a first portion and a second portion, wherein the first 
and second portions are fused to first and second dimerization domains respectively, 
5 and wherein a function of the bisected molecule is altered by bisection. By selecting 
for restoration of function of a bisected molecule in response to a signal, a conditional 
heterodimer may be obtained. 

In one aspect, a conditional heterodimer is used to conditionally provide an 
activity to a cell. Preferably, the dimerization is mediated by a signal, such as binding 
1 0 of drag to the dimerization domain such that the activity can be triggered by 
administering a dmg to the cell. 

Brief Description of the Figures 

The objects and features of the invention can be better understood with 
reference to the following detailed description and accompanying drawings. 

1 ^ Figures 1 A-C are schematic diagrams illustrating strategies for generating 

molecular switches according to the invention. Figure 1 A shows a domain insertion 
strategy according to one aspect of the invention. Figure IB shows conditional 
heterodimers according to another aspect of the invention. Figure IC shows a 
strategy for generating an enzymerbinding protein hybrid according to one aspect of 

20 the invention. As shown in Figure IC, catalytic activity of an enzyme domam of the 
fusion molecule is coupled to binding of the fusion molecule to a signaling protein 
(protein B). 

Figures 2A-D show cloning steps in generating libraries of fusion molecules 
according to one aspect of the invention. Figure 2A shows preparation of a nucleic 
25 acid encoding an insertion sequence (e.g., P-lactamase) for subsequent cloning steps. 
Figure 2B shows random insertion of the insertion sequence into acceptor sequences 
digested with a nuclease. Figure 2C shows a variation of the insertion method shown 
in 2B which comprises incremental trancation. Figure 2D is a flow chart illustrating 
selection of active fusions according to one aspect of the invention. 

9 
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Figures 3 A-<3 illustrate methods of using molecular switches according to 
aspects of the invention. Figure 3 A shows regulation of gene transcription using a 
fusion molecule according to one aspect of the invention. Figure 3B shows 
modulation of a cell signaling pathway according to another aspect of fiie invention. 
5 Figure 3C shows drug delivery mediated by a fusion molecule to a cell expressing a 
marker of a pathology. Figure 3D shows the use of fusion molecules for dmg 
transport to an intracellular compartment Figure 3E shows delivery of a 
conditionally toxic fusion molecule to a cell. Figure 3F shows the use of a fusion 
molecule for metabolic engineering. Figure 3G shows a fusion molecule according to 
10 one aspect of the invention which functions as a biosensor. 

Figure 4 shows a fusion molecule according to one aspect of the invention 
which comprises the transferrin domain transport sequence and a meAotrexate 
binding sequence (e.g., such as Dihydrofolate reductase). Outside the cell, the 
transferrin domain of the * Trojan horse* fusion protein binds iron and the drug 
1 5 binding domain binds methotrexate. The fusion protein interacts with the transferrin 
rec^tor and is endocytosed. A decrease in pH in the endosome causes a 
conformational change in the transferrin domain resulting in a conformational change 
in the drug binding domains which occurs concomitant with drug release. The fusion 
is recycled back outside of the cell to repeat the cycle again. 

20 Figures 5 A-C show a strategy for engineering a switch molecule by generating 

a conditional heterodimer. Figure 5 A shows bisecting a polypeptide whose function 
is to be controlled into two firagments that cannot functionally associate by ' 
themselves. Figure 5B shows selection of molecules whicK functionally associate 
when fused to dimerization domains. Figure 5C shows dimerization which occurs in 

25 response to a signal according to one aspect of &e invention. 

Figures 6A-B show strategies to generate libraries of fusion molecules 
comprising bisected polypeptides fused to oligomerization domains. Figure 6A 
shows a method for generating libraries of such molecules. Figure 6B shows the 
addition of dimerization domains. 

30 Figure 7A shows the frequency of active heterodimers of Neo identified from 

a library of fusion molecules whose assembly is assisted by antiparallel leucine 

10 
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zippers. Figure 7B is a graph summarizing sequence data obtained fiom libraries 
comprising heterodimers as in Figure 7A. Sequences falling on the diagonal line in 
flie graph have no overlap or deletion between firagments. Sequences of heterodimers 
above the line have overlapping sequences, while those below the line have deleted 
5 amino acids. In a library without a flexible linker, sequencing of sixteen randomly 
selected colonies from kanamycin plates resulted in the identification often different 
heterodimers of Neo (indicated by the large cross) whose assembly is assisted by 
antiparallel leucine zq>pers. In a library with a GSGG flexible linker, sequencing of 
sue randomly selected colonies from kanamycin plates resulted in the identification of 
1 0 four different heterodimCTS of Neo (indicated by the thin-line cross). 

Figure 8 shows the effect of sugars on a T164-165 p-lactamase: maltose 
binding protein (MBP) fusion's hydrolysis of nitrocefm. The fusion comprises an 
insertion of p-lactamase amino acid sequences into an MBP acceptor polypeptide 
with a tandem duplication of amino acids 164-165 of MBP at the insertion site. The 

15 velocity of nitrocefin hydrolysis wifli 150 pM nitrocefin and 5 mM of flie indicated 
sugars was compared to the velocity without any sugar. Sugars known not to bind 
wildtype MBP (sucrose) and those that bind to MBP, but do not introduce a 
confromational change (maltitol and P-cyclodextrin) did not have a significant effect 
on nitrocefin hydrolysis. All sugars known to bind to wildtype MBP and induce a 

20 conformational change (maltose, maltotriose and maltdhexose) increase the rate of 
hydrolysis by approximately 40%. 

Detafled Description 

The invention provides molecular switches which couple external signals to 
functionality and to methods of making iand using the same. The switches according 
25 to the invention can be used, for example, to regulate gene transcription, target drug 
delivery to specific cells, transport dmgs intracellularly, control drug release, provide 
conditionally active proteins, perform metabolic engineering, and modulate cell 
signaling pathways. Libraries comprising the switches and expression vectors and 
host cells for expressing the switches are also provided. 



11 
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Definitions 

The following definitions are provided for specific terms which are used in the 
following written description. 

As used herein, a "molecular switch" refers to a molecule which generates a 
5 measurable change in state in response to a signal. In one aspect, a molecular switch 
is capable of switching from at least one state to at least one other state in response to 
the signal. Preferably, when a portion of the molecule responds to the signal, the 
portion become activated (i.e., turns "ON") or inactivated (i.e., turns "OFF"). In 
response to this change in state, flie state of another portion of the fusion molecule 

10 will change (e.g., turn ON or OFF). In one aspect, a switch molecule turns ON one 
portion of the molecule when another portion is turned OFF. In another aspect, the 
switch turns ON one portion of the molecule, when the other portion is turned ON. In 
still another aspect, the switch molecule turns OFF one portion of the molecule when 
the ottier portion is turned ON. In a further aspect, the switch molecule turns OFF, 

15 when the other portion is turned OFF. In some aspects of the invention, a molecular 
switch exists in more than two states, i.e., not simply ON or OFF. For example, a 
portion of the fusion molecule may display a series of states (e.g., responding to 
different levels of signal), while another portion of the fusion molecule responds at 
each state, with a change in one or more states. A molecular switch also can comprise 

20 a plurality of fusion molecules responsive to a signal and which mediate a function by 
changing the state of at least a portion of the molecule (preferably, in response to a 
change in state of another portion of the molecule). While the states of individual 
fusion molecules in the population may be ON or OFF, the aggregate population of 
molecules may not be able to mediate the function unless a threshold number of 

25 molecules switch states. Thus, the "state" of the population of molecules may be 
somewhere in between ON or OFF dependmg on the number of molecules which 
have switched states. In one aspect a molecular swatch comprises a heterogeneous 
population of fusion molecules comprising members which switeh states upon 
exposure to different levels of signal. Li other aspects of the invention, however, tibe 

30 state of a single molecule may be somewhere in between ON or OFF. For example, a 
molecule may comprise a given level of activity, ability to bin^ etc., in one state 
which is switohed to another given level of activity, ability to bind, etc., in another 

12 
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state (i.e., an activity, ability to bind, etc., measurably higher or lower than the 
activity, ability to bind, etc., observed in previous state). 

As used herein, a " state " refers to a condition of being. For example, a "state 
of a molecule*' or a "state of a portion of a molecule" can be a conformation, binding 
affinity, or activity (e.g., including, but not limited to, ability to catalyze a substrate; 
ability to emit light, transfer electrons, transport or localize a molecule, modulating 
transcription, translation, replication, supercoiling, and the like). 

As defined herein, a molecule, or portion hereof, whose state is "activated" 
refers to a molecule or portion hereof which performs an activity, such as catalyzing 
a substrate, emitting ligjit, transferring electrons, catalyzing a substrate, transporting 
or localizing a molecule; changes conformation; binds to a molecule, etc. 

As defined herein, a molecule, or portion thereof, whose state is "inactivated^' 
refers to a molecule or portion hereof which is, at least temporarily, unable to 
perform an activity or exist in a particular state (e.g., bind to a molecule, change 
conformation). 

As used herein, "coupled" refers to a state which is dependent on another state 
such that a measurable change in the other state is observed. As used herein, 
"measurable" refers to a that is significantly different firom a baseline or a previously 
existing state as determined in a suitable assay using routine statistical methods (e.g., 
20 setting p<0.05). 

As used herein, "a signal" refers to a molecule or condition that causes a 

reaction. Signals include, but are not hmited to, the presence, absence, or level, of 

molecules (nucleic acids, proteins, peptides, organic molecules, smaU molecules), 

ligands, metabolites, ions, organelles, cell membranes, cells, organisms (e.g., 

25 pattiogens), and the like; as well as flie presence, absence, or level of chemical, 

optical, magnetic, or electrical conditions, and can include conditions such as degrees 

of temperature and/or pressure. A chemical condition can include a level of ions, e.g., 
pH. 

As used herein, "responsive to a signal" refers to a molecule whose state is 
30 coupled to the presence, absence, or level of the signal. 

13 
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As used h^in, "an insertion sequence" refers to a polymeric sequence which 
is contained within another polymeric sequence (e.g., an "acceptor sequence") and 
which conditionally alters flie state of the other polymeric sequence. An insertion 
sequence or acceptor sequence can comprise a polypeptide sequence, nucleic acid 
5 sequence (DNA sequence, aptamer sequence, RNA sequence, ribozyme sequence, 
hybrid sequence, modified or analogous nucleic acid sequence, etc), carbohydrate 
sequence, and the like. 

As used herein, "multistable" refers to a fusion molecule which is capable of 
existing in at least two states. 

10 As used herein, "bistable" refers to a fusion molecule capable of existing in 

two states. 

As used herein, "range of states" refers to a series of states in which a fusion 
molecule can exist. For example, a range of states can comprise a range of binding 
activities, a range of light-emitting activities, a range of catalysis efficiencies, and the 
15 Uke. 

As used herein, "a change in state" refers to a measurable difference in a state 
of being of a molecule, as determined by an assay appropriate for tiiat state. 

As used herein, "a graded response" refers to the ability of a fusion molecule 
to switch to a series of states in response a particular threshold signal. 

20 As used herein, "modulates" or "modulated" refers to a measurable change in a 

state or activity or function. Preferably, where an activity is being described, 
"modulated" refers to an at least 2-fold, at jeast 5-fold, at least 10-fold, at least 20-fold 
or higher, increase or decrease in activity^ or an at least 10%, at least 20%, at least 
30%, at least 40% or at least 50% increase or decrease in activity. However, more 

25 generally, any difference which is measurable and statistically dilBferent from a 
baseline is encompassed within Hhe term "modulated". 

As used herein, a "less active state" is a state which is at least about 2-fold less 
active compared to a given reference state as measured using an assay suitable for 
measuring that state, or about at least 10%, at least about 20%, at least about 30%, at 
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least about 40%, at least about 50%, at least about 60%, at least about 70%, at least 
about 80%, at least about 90% or at least about 100% less active. More generaUy, any 
decrease which is measurable and statistically different from baseline is encompassed 
within the term "less active state". 

As used herein, a "less toxic state" refers to a measurable increase m the LD50 
(i.e., lethal dose which has a 50% probability of causing death) or LC50 (i.e., lethal 
concentration which has a 50% probability of causing death). Preferably, a less toxic 
state is one which is associated with an at least about 10% increase, at least about 
20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at 
least about 70%, at least about 80%, at least about 90% or at least about 100% 
increase in LD50 or LC50 

As used herein, "a bio-efTective molecule" refers to bioactive molecule which 
can have an affect on the physiology of a cell or which can be used to image a cell. In 
one aspect, a "bio-effective molecule" is a pharmaceutical agent or drug or other 
material that has a therapeutic effect on the cell. 

As used herein, "a cellular marker of a pathological condition" refers to a 
molecule which is associated with a cell, e.g., intracellularly or extracellularly, and 
whose presence or level correlates with the presence of the disease, i.e., the marker is 
found in, or on cells, or is secreted by cells, exhibiting the pathology at levels which 
are significantly different than observed for cells not exhibiting the pathology 

As used herein, "a molecular pathway molecule" refers to a molecule whose 
activity and/or expression affects the activity and/or expression of at least two other 
molecules. Preferably, a molecular pathway molecule is a molfecule involved in a 
metabolic or signal transduction pathway. A pathway molecule can comprise a 
protein, polypeptide, peptide, small molecule, ion, cofactor, organic and inorganic 
molecule, and flie like. 

As used herein, "modulating a molecular pathway" refers to a change in the 
expression and/or activity of at least one pathway molecule. 

As used herein, "at an insertion site" of a nucleic acid molecule refers to from 
about 1 to 21 nucleotides immediately flanking the insertion site. 

15 



wo 03/078575 



PCT/US03/07380 



As used herein, ''randomly inserting" refers to insertion at non-selected sites in 
a polymeric sequence. In one aspect; "random insertion" refers to insertion that 
occurs in a substantially non-biased fashion, i.e., there is a substantially equal 
probability of inserting between members of any pairs of monomers (e.g., nucleotides 
5 or amino acids) in an acceptor molecule comprising a given number of monomeric 
sequences. However, in another aspect, random insertion has some degree of bias, 
e.g., there is a greater than equal probability of inserting at different sites. Minimally, 
the probability of insertion at a site in an acceptor sequence is greater than zero but 
less than one. 

10 As used herein, "a new activity" refers to an activity which is not found in 

either donor or acceptor sequences. Generally, fusion molecules according to the 
invention comprise a new activity in that the activity of the acceptor sequence or 
insertion sequence is newly coupled to the state of the respective other portion of flie 
sequence. An insertion or acceptor sequence also may con^rise a catalytic site which 

1 5 responds to (e.g., catalyzes) a substrate provided in the form of the respective other 
portion of the fusion molecule, thereby producing a fusion molecule which comprises 
an activity present in neither flie original catalytic site or the substrate (e.g., such as 
the ability to self-cleave in the presence of a signal). 

As used herein, "a nuclear regulatory sequence refers to" a nucleic acid 
20 sequence which is capable of modulating the activity of another nucleic acid in cis or 
in trans. Types of activities regulated include, but are not limited to, modulating 
transcription, translation, replication, recombination, or supercoiling. A nucleic acid 
regulatory sequence can include promoter elements, operator elements, repressor 
elements, enhancer sequences, ribosome binding sites, IRES sequences, origins of 
25 replication, recombination hotspots, tppoisomerase binding sequences, and the like. 

As used herein, "altered by bisection" refers to a change in state upon 
fragmenting a polypeptide into two pieces. The term "bisection" does not imply that 
the polypeptide is divided into fragments of equal size; rather fragments can be 
generated by cleaving anywhere along the length of the primary sequence of the 
30 amino acid. 

16 t 



JSDOCiD: <WO 0307857 5A2_r_> 



wo 03/078575 



PCT/US03/07380 



As used herein, "selecting for restoration of fimction or state" refers to 
selection for restoration of a function or state which is sufficiently similar to fliat of 
the original fimction under assay conditions suitable for evaluating the fiinction or 
state. As used herein, "sufficiently similaf refers to a state that can achieve the 
5 original function in an effective manner For example, when the function/state is 
binding, restoration of function/state can be evaluated by generating Scatchard plots 
and/or determining Kd- When flie function/state is the ability of a molecule to 
generate light, restoration can be measured spectrophotometrically, for example^ 

As used herein, a "modification" of a polypeptide refers to an addition, 
10 substitution or deletion of one or more amino acids in a polypeptide which does not 
substantially alter the state of the polypeptide. For example, where a state is an 
activity of a polypeptide, a modification results in no more than a 1 0% decrease or 
increase in the activity of the polypeptide, and preferably no more .than a 5% decrease 
or increase in the activity of the polypeptide. 

15 Fusion Molecules 

Domain Insertion 

In one aspect, a fusion molecule is provided which comprises an insertion 
sequence and an acceptor sequence which contains the insertion sequence (see. Figure 
1 B). Preferably, the insertion sequence and acceptor sequence are poljrmeric 

20 molecules, e.g., such as polypeptides or nucleic acids. More preferably, boft the 

uisertion sequence and acceptor sequence are capable of existing in at least two states 
and the state of the insertion sequence is coupled to the state of the acceptor sequence 
upon fusion, such that a ch^ge in state in either the insertion sequence or acceptor 
sequence will result in a chmge in state of respective other i>ortion of th^ A 

25 "state" can be a conformation; binding afiBnity; ability or latent ability to catalyze a 
substrate; ability or latent ability to emit light; ability or latent ability to transfer 
electrons; ability or latent ability to withstand degradation (e.g., by a protease or 
nuclease); to modulate transcription; ability or latent ability to modulate translation; 
ability or latent ability to modulate replication; ability or latent ability to initiate or 

30 mediate recombmation or supercoiling; or otherwise perform a function; and flie like. 
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Preferably, the change in state is triggered by a signal to which the fusion 
molecule is exposed, e.g., such as the presence, absence, or amount of a small 
molecule, ligand, metaboUte, ion, organelle, cell membrane, cell, organism (e.g., such 
as a pathogen), temperature change, pressure change, and the like, to which the fusion 
5 molecule binds; a change in a condition, such as pH, or a change in the chemical, 
optical, electrical, or magnetic environment of the fusion molecule. In one aspect, a 
fusion molecule functions as an ON/OFF switch in response to a signal (e.g., 
changing froin one state to another). For example, when an insertion sequence or 
acceptor sequence of the fusion molecule binds to a ligand, the respective other half 
10 of the fusion may change state (e.g., change conformation, bind to a molecule, release 
a molecule to which it is bound, catalyze a substrate or stop catalyzing a substrate, 
emit light or stop emitting light, transfer electrons or stop transferring electrons, 
activate or inhibit transcription, translation, replication, etc.). 

However, fusion molecules according to the invention also can be used to 
15 generate graded responses. In this scenario, a fusion molecule can switch from a 
series of states (e.g., more than two different types of conformations, levels of 
activity, degrees of binding, levels of light transmission, electron transfer, 
transcrq)tion, translation, replication, etc). Preferably, the difference in state is one 
which can be distinguished readily from other states (e.g., there is a significant 
20 measurable difference between one state and any other state, as determined using 
assays appropriate for measuring that state): 

More generally, a molecular switch is one which generates a measurable 
change in state in response to a signal. For example, a molecular switch can comprise 
a plurality of fusion molecules each responsive to a signal and for mediating a 

25 function in response to a change in state of at least a portion of the molecule. As 
above, preferably, this change of state occurs in response to a change in state of 
another portion of the molecule. While the states of individual fusion molecules in 
the population may be ON or OFF, the aggregate population of molecules may not be 
able to mediate the function unless a threshold number of molecules switch states. 

30 Thus, the "state" of the population of molecules may be somewhere in between ON or 
OFF, depending on the number of molecules which have switched states. This 
provides an ability to more precisely tune a molecular response to a signal by 
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selecting for molecules which respond to a range of signals and modifying the 
population of fusion molecules to provide selected numbers of fusion molecules, 
providing an aggregate switch which respond to a narrow range or wider range of 
signal as desired. Hius, in one aspect, a heterogeneous population of fusion 
5 . molecules is provided conq>rising members which respond to different levels or 
ranges of signals. Individual fusion molecules also may exist in states intermediate 
between ON or OFF; e.g., having a given level of activity, ability to bind to a 
molecule in one. state and a measurably higher or lower level of activity, ability to 
bind, etc., in a different state. 

10 Insertion Sequences 

The size of the insertion will vary depending on the size of an insertion 
sequence required to confer a particular state on the insertion sequence without 
significantly dismpting the ability of the acceptor molecule into which it is inserted to 
change state. Preferably, the afifect of the insertion is to couple the change in state of 
15 the acceptor molecule to a change in state of the insertion molecule or visa versa. 

Generally, for polypeptide insertions, the size of the insertion sequence can 
range from about two amino acids to at least about 1 20 amino acids. In one aspect, 
the insertion comprises a domam sequence with a known characterized activity (e.g., 
a portion of a protein in which bioactivity resides); however, in other aspects, the 
20 insertion sequence comprises an entire protein sequence. 

In one aspect, the insertion sequence is a polypeptide whose folded 
conformation is such that the N- and C- termini are "on the same face" of a fusion 
molecule comprising the insertion sequence. 

Acceptor Seqtiefices 

25 Generally, there are no constraints on the size or type of acceptor sequence 

which can be used. However, in one aspect, an acceptor sequence is a polypeptide 
whose state resides in a discontinuous domain of a protein (eig., the amino acids 
involved in conferring the state/activity of the acceptor sequence are not necessarily 
contiguous in the primary polypeptide sequence) (see, e.g., as described in Russell 
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and Ponting, 1998, Curr, Opin. Struct BioL 8: 364-371, and Jones, et aL, 1998, 
Protein ScL 7: 233-42). 

Suitable polypeptides for acceptor molecules can be identified using domain 
assignment algorithms such as are known in the art (e.g., such as the PUU, 
5 DETECTIVE, DOMAK, and DomainParser, programs). For example, a consensus 
approach may be used as described in Jones, et al., 1998, supra, Inforaiation also can 
be obtained fi^om a number of molecular modeling databases such as the NIH 
Molecular Modelling Homepage, accessible at 

ht tp://cmm.info.nih.gov/modeling/pdb at a glance.html: or the 3Dee Database 
10 described by Dengler, et al., 2001, Proteins 42f3k 332-44. However, 4e most 

important criteria used for selecting a sequence is its function, e.g., the desired state 
parameters of the fusion molecule. 

However, in a further aspect, no pre-screening is done and an acceptor 
sequence is selected simply on the basis of a desired activity. The power of the 
15 methods according to the invention is that they rely on combinatorial screening to 
idCTitify any, and preferably, all, combinations of insertions that produce a desired 
coupling in states of acceptor and donor molecules. 

Domain Sequences 

In one aspect, the insertion sequence or acceptor sequence comprises a 
20 "domain" sequence having a known state. Domains can be minimal sequences, such 
as are known in the art, which are associated with a particular known state or can be 
an entire protein connprising the domain or a functional fragment thereof. 

Minimal domain sequences can be defined by site-directed mutagenesis of a 
sequence having a desired state to determine the minimum amino acids necessary to 

25 confer flie existence of the state under tfie appropriate conditions (e.g., such as a 

minimal binding site sequence or a minimimi sequence necessary for catalysis, light 
emission, etc.). As discussed above, minimal domain sequences also can be defined 
virtually, using algorithms to identify consensus sequences or areas of likely protein 
folding. Oiice a domain sequence has been identified, it can be modified to include 

30 additional sequences, as well as insertions, deletions, and substitutions of amino acids 
so long as they do not substantially affect the state of the domain sequence. While 
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domain sequences can be obtained using nucleic acids encoding appropriate 
ftagments of polypeptides, they also can be synthesized, for example, based on a 
predicted consensus sequence for a class of molecules which is associated with a 
particular state. However, as discussed above, in some cases it may be desirable to 
5 provide the domain sequence in Ihe form of a native protein comprising the domain. 

Suitable domain sequences include extracellular domains which are portions 
of proteins normally found outside of the plasma membrane of a cell. Preferably, 
such domains bind to bio-effective molecules. For example, an extracellular domain 
can include the extracytoplasmic portion of a transmembrane protein, a secreted 
10 protein, a cell surface targeting protein, a cell adhesion molecule, and the like. In one 
aspect, an extracellular domain is a clustering domain, which, upon activation by a 
bio-effective molecule will, dimerize or oligomerize with other molecules comprising 
extracellular domains. 

Intracellular domains also can serve as insertion sequences or acceptor 
15 sequences. As used herein, "an intracellular domain" refers to a portion of a protein 
which generally resides inside ofaceUwifli respect to Ihe cellular membrane. In one 
aspect, an intracellular domain is one which transduces an extraceUular signal into an 
intracellular response. For example, an intracellular domain can comprise a 
proliferation domain which signals a cell to enter mitosis (e.g., such as domains from 
20 Jak kinase polypeptides, 11-2 receptor ^ and/or gamma chains, and the like). Other 
transducer sequences include sequences from the zeta chain of the T ceU receptor or 
any of its homplogs (e.g., the eta chain, Fc epsilon Rl - gamma and - 62 chains, MBl 
chain, B29 chain, and the like), CDS polypeptides (gamma, beta and.epsilon ), syk 
family tyrosine kinases (Syk, ZAP 70, and the like), and src femily tyrosine kinases 
25 (Lck, Fyn, Lyn, and the like). 

A transmembrane domain also can be used as an insertion sequence or 
acceptor sequence. Preferably, a transmembraiie domain is able to cross the plasma 
membrane and can, optionally, transduce an extracellular signal into an intracellular 
response. Preferred transmembrane sequences include, but are not limited to, 
30 sequences derived from CDS, ICAM-2, IL-8R, CD4, LFA-1 , and the Kke. 
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Transmembrane sequences also can include GPI anchors, e.g., such as the 
DAF sequence (PNKGSGTTSGTTRLI^GHTCTTLTGLLGTLV™ (see, 
e.g., Romans, et al., 1988, Nature 333f6I70h 269-72; Moran, et al., 1991, J, Biol 
Chein. 266 : 1250); myristylation sequences (e.g., such as the src sequence 
5 MGSSKSKPKDPSQR) (see Cross, et al., 1984, MoL Cell Biol 4^: 1834; Spencer, 
et al., 1993, Science 262: 1019-1024); and palmitoylation sequences (e.g., such as flie 
GRK6 sequence LLQRLFSRQDCCGNCSDSEEELPTR). 

Either the insertion sequence or the acceptor sequence can be a localization 
sequence for localizing a molecule comprising flie sequence intracellularly. In one 

10 aspect, the localization sequence is a nuclear localization sequence. Generally, a 
nuclear localization sequence is a short, basic sequence fliat serves to direct a 
polypeptide in which it occurs to a cell's nucleus (Laskey, 1986, Ann. Rev. Cell BioL 
2:367-390; Bonnerot, et al., 1987, Proc. Natl Acad, Sd. USA 84: 6795-6799; Galileo, 
et al., 1990, Proc. Natl Acad. ScL USA 87: 458-462, 1990). Suitable nuclear 

1 5 localization sequences include, but are not limited to, the S V40 (monkey virus) large 
T Antigen sequence (PKKKKKV) i^see, e.g., Kalderon, 1 984, et al.. Cell 39: 499- 
509); the human retinoic acid receptor nuclear localization signal (ARRRRP); NF kP 
p50 sequence (EEVQRKRQKL) (Ghosh et aL, 1990, Cell 62: 1019); the NF kB p65 
sequence (EEKRKRTYE) (Nolan et al., 1991, Cell 64: 961); and nucleoplasmin (Ala 

20 Val Lys Arg PAATLKKAGQAKKKKLD) (Dingwall, et al., 1982, Ce// 50:449-458). 

The localization sequence can comprise a signaling sequence for inserting at 
least a portion of the fusion molecule into the cell membrane. Suitable signal 
sequences include residues 1-26 of the IL-2 receptor beta-chain (see, Hatakeyama et 
aL, 1989, Science 244: 551; von Heijne et al, 1988, Eur. Jl Biochenu 174 : 671); 
25 residues 1-27 of the insulin receptor p chain (see, Hatakeyama, et aL, 1989, supra); 
residuesl-32 of CD8 (Nakauchi, et aL, 1985, FNAS USA 82: 5126) and residues 1-21 
of ICAM-2 (Staunton, et al., 1989, Nature (London) 339 : 61). 

The localization sequence also can comprise a lysozomal targeting sequence, 
inchiding, for example, a lysosomal degradation sequence such as Lamp-2 (KFERQ) 
30 (see, e.g.. Dice, 1992^ Amu N Y. Acad. Sci. 674: 58); a lysosomal membrane sequence 
from Lamp-1 (MLIPIAGFFAIAGLVLIVLIAYLIGRKRSHAGYOTI) (e.g., 
Uthayakumai, et aL, 1995, Cell Mol Biol Res. 41: 405) or Lamp-2 
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(LVPIAVGAALAGVLILVLLAYHGLKHHHAGYEQF) (e.g., Konecki et al., 1994, 
Biochem. Biophys. Res. Comm. 205: 1-5). 

Alternatively, the localization sequence can comprise a mitrochondrial 
localization sequence, including, but not limited to: mitochondrial matrix sequences, 
such as Ihe MLRTSSLFTRRVQPSLFSRNILRLQST of yeast alcohol dehydrogenase 
m (Schatz, 1987, Eur. J. Biochem. 165:1-6); mitochondrial inner membrane 
sequences, such as the MLSLRQSIRFFKPATRTLCSSRYLL sequence of yeast 
cytochrome c oxidase subunit IV (Schatz, 1987. supra); mitochondrial inteimembrane 
space sequences, such as the 

MFSMLSJCRWAQRTLSKSFYSTATGAASKSGKLTQKLVTAGVAAAGITASTLL 
YADSLTAEAMTA sequence of yeast cytochrome cl (Schatz, 1987, supra); or 
mitochondrial outer membrane sequences, such as the 

MKSFITRNKTAILATVAATGTAIGAYYYYNQLQQQQQRGKK sequence of 
yeast 70 kD outer membrane protein (see, e.g., Schat% supra). 

1 5 Other suitable localization sequences include endoplasmic reticulum 

localizing sequences, such as KDEL from caketiculin (e.g., Pelham, 1992, Royal 
Society London Transactions B: 1-10) or the adenovirus E3/19K protein sequence 
LYLSRRSFIDEKKMP (Jackson etal., 1990, EMBO J. P: 3153); and peroxisome 
targeting sequences, such as the peroxisome matrix sequence (SKL) from Luciferase 
20 (KeUer et al., 1 987, Proc. Natl. Acad. Sci. USA ±. 3264). 

Li another aspect, Ihe insertion sequence or acceptor sequence comprises a 
secretory signal sequence c^le of effecting the secretion of the fusion molecule 
from a cell (see, e.g.. Silhavy, et al;. 1985, Microbiol. Rev. 49;. 398-418). This may be 
useful for generating a switch molecule which can affect the activity of a cell other 

25 than a host cell in which it is expressed. Suitable secretory sequences, include, but are 
not limited to the MYRMQLLSCIALSLALVTNS sequence of IL-2 (Villinger, et al., 
1995, J. Immunol. 151: 3946); the MATGSRTSLLLAFGLLCLPWLQEGSAFPT 
sequence of growth hormone (Roskam et al., 1 979. Nucleic Acids Res. 7: 30); the 
MALWMRIXPLLALLALWGPDPAAAFVN sequence of preproinsulin (Bell, et al., 

30 1 980, Nature 284: 26); the influenza HA protein sequence, 

MKAKLLVLLYAFVAGDQI (Sekiwawa, et al;. Proc. Natl. Acad. Sci. USA 80: 
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3563); or the signal leader sequence from the secreted cytokine IL4, 
MGLTSQLLPPLFFLLACAGNFVHG. 

In a further aspect, the insertion sequence or acceptor sequence comprises a 
domain for binding a nucleic acid. The domain can comprise a DNA binding 
5 polypeptide or active fragment thereof from a prokaryote or eukaryote. For example, 
the domain can comprise a polypeptide sequence from a prokajryotic DNA binding 
protein such as gp 32; a domain from a viral protein, such as the papilloma virus E2 
protein; or a domain from a eukaryotic protein, such as p53, Jun, Fos, GCN4, or 
GAL4. Novel DNA binding proteins also can be generated by mutagenic techniques 
10 (see, e.g., as described in U.S. Patent No. 5,1 98,346). 

The insertion sequence or acceptor sequence also can comprise the Ca^^ 
binding domain of a Ca+ binding protein such as calmodulin, parvalbumin, troponin, 
annexin, and myosin or the ligand domain of a binding protein such as avidin, 
concanavalm A, ferritin, fibronectin, an immunoglobulin, a T Cell Receptor, an MHC 
1 5 Class I or Class II molecule, a lipid binding protein, a metal binding protein, a 
chaperone, a G-Protein Coupled Receptor, and the like. 

In addition, the. insertion or acceptor sequence can comprise ttie transport 
domain of a transport proteih such as hemeiythrin, hemocyanin, hemoglobin, 
myoglobin, transferrin, lactoferrin, ovotransferrin, maltose binding jirotein and 
20 traiisth3nretrin. 

In another aspect, the insertion or acceptor sequence can comprise the active 
domain of a blood coagulation protein (e.g. j a domain which mediates blood clbtting). 
Exemplary blood clotting proteins include, but are not limited to: decorsin, factor IX, 
factor X, kallikirein, plasmin/plasminogen, protein C, thrombin/prothrombin, and 
25 tissue-type plasminogen activator. 

In still another aspect, the insertion or acceptor sequence can comprise the 
active domain of an electron transport protein (e.g., a domain which confers electron 
transport activity on a protein). Electron transport proteins include, but are not 
limited to, amicyanin, azurin, a cytochrome protein, ferrodoxin, flavodoxin, 
30 glutaredoxin, methylamine dehydrogenase, plastocyanin, rubredoxin, and thioredoxin. 
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In a further aspect, the insertion sequence or acceptor sequence comprises flie 
catalytic and/or substrate binding site of an enzyme. Suitable enzymes from which 
such sites are selected include: P-lactamase; acetylcholinesterase; an amylase; 
bamase; deaminase; a kinase (e.g., such as a tyrosine kinase or serine kinase); a 
5 phosphatase; an endonuclease; an exonuclease; an esterase; an enzyme involved in a 
metaboMc pathway (e.g., fructose-1 ,6-bisphosphatase); a glycosidase; a heat shock 
protein; a lipase; a lysozyme; a neuramidase/sialidase; a phosphoKpase; a 
phosphorylase; a pyrophosphatase; a ribonuclease; a Ihiolase; a polymerase; an 

isomerase (such as a mutase; triosephosphate isomerase, xylose isomeiase, 
10 topoisomerase, gyrase); a lyase (such as aconitase, carbonic anhydrase, pyruvate 

decarboxylase); an oxidoreductase (such as alcohol dehydrogenase, aldose reductase, 
a catalase,cytochiome C peroxidase, cytochrome p450, a dehydrogenase, 
dihydrofolate reductase, glyceraldehydes-3-phosphate dehydrogenase, a 
hydroxybenzoate hydroxylase, a lactate dehydrogenase, a peroxidase, and a 
superoxide dismutase); a protease (such as actinidm, a-lytic protease, aminopeptidase, 
caiboxypeptidase, chymosin, chymotrypsin, elastase, endopeptidase, endothiapepsin, 
HIV protease, Hannuka factor, papain, pepsin, rennin, substilisin, thermolysin, 
thennitase, and trypsin), a transferase (such as acetyltransferase, aminotransferase, 
carbamoyltransferase, dihyrolipoamide acetyltransferase, dihydrolipoyl 
transacetylase, Dihydrolipoamide Succinyltransferase. a nucleotidyl transferase, DNA 
methyllransferase, formyltransferase, glycosyltransferase, a phosphotransferase, a 
phosphoribosyltransferase), a dehalogenase, a racemase, and the like. 

The ca^ytic doitnain also can be a rfaodanese homology domain such as forms 
fte active site in various phosphatases and teansferases (e.g., such as found in ttie 
25 Cdc25 femify of protein dual specificity phosphatases, the MKPl/PACl &mily of 
MAP-kinase phosphatases, the Pypl/Pyp2 family of MAP-kinase pho^hatases, and 
certain ubiquitin hydrolases) (see. e.g.^ Hofinann, et al., 1998, J. Mol Biol. 282: 195- 
208). 

Still other domains can include toxins such as cardioioxin, conotoxin, 
30 erabutoxin, momorcharin, momordin, and ricin. 

Other domains mclude, but are not limited to, signaling domains such as the 
FHA domain, found in protein kinases and transcription factors such as fork head, 
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DUNl, RAD53, SPKl, cdsl, MEKl, KAPP, NIPPl, Ki-67, firaH, and KIAA0170 
(see, e.g., Hofimaim and Bucher, 1995, Trends Biochem. Set 20: 347-349); flie deafli 
domain, a heterodimerization domain present in proteins involved in apoptotic signal 
transduction and the NFkp pathway (such as TNFRl, FAS/APOl, NGFR, 
5 MORTl/FADD, TRADD, RIP, ankyrin, MyD88, unc-5, unc-44, DAP-kinase, Rb- 
binding p84, pelle, NFkB, and tube polypeptides) (see, e.g., Hofinann and Tschopp, 
1995, FEBSLeU. 371: 321-323); and the G-protein desensitization domain (found in 
ARKl, GRK, G-protein coupled receptor kinases, egl-10, GAIP, BL34 SST2, flbA, 
RGP3, RGP4Human GO/Gl switch regulatory protein 8, Human B-cell activation 
10 protein BL34, and G-protein coupled receptor kinases) (see, e.g., Hofinann and 

Bucher, "Conserved Sequence Domains in Cell Cycle Regulatory Proteins", abstract 
presented at the joint ISREC/AACR meeting "Cancer and the Cell cycle", January 
1 996 in Lausanne). 

In one aspect, either the insertion or the acceptor sequence is a light-emitting 
1 5 polypeptide domain such as one obtained from a Green Fluorescent Protein^ or 

modified, or mutant form fliereof (collectively referred to as a "GFP"). The wild-type 
GFP is 238 amino acids in length (Prasher, et aL, 1992, Gefte 111(2) : Cody 
et al., Biochetn. i2£52: 1212-12 18 (1993); Ormo, et al, 1996, Science 273: 1392-1395; 
and Yang, et al., 1996, Nat Biotech. 14: 1246-1251). Modified forms are described 
20 in WO 98/06737 and U.S. Patent No. 5,777,079. GFP deletion mutants also can be 
made. For example, at the N-terminus, it is known tiiat only the first amino acid of 
the protein may be deleted without loss of fluorescence, while at the C-terminus, up to 
7 residues can be deleted without loss of fluorescence (see, e.g., Phillips, et al., 1997, 
Current Opin. Structural BioL 7: S2iy 

25 The insertion sequence or acceptor sequence additionally can comprise the 

light-reactive portion of a photoreceptor such as bacteriochlorophyll-A, 
bacteriorhodopsin, photoactive yellow protein, phycocyanin, and rhodopsin. 

Additional domain sequences include ligand-binding domains of ligand- 
binding proteins. Such proteins include, but not limited to: biotin-binding proteins, 
30 lipid-binding proteins, periplasmic binding proteins (e.g. maltose binding protein), 
lectins, serum albumins, immunoglobulins, T Cell Receptors, inactivated enzymes, 
pheromone-binding proteins, odorant-binding proteins, immunosuppressant-binding 
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proteins (e.g., immunophilins such as cyclophilins and FK506^bindmg proteins), 
phosphate-binding proteins, sulfate-binding proteins, and the like. Additional binding 
proteins are described in De Wolf and Brett, 2000, Pharmacological Revievs^s 520) : 
207-236.] 

The domain sequences of the protems described above are known in the art 
and can be obtained from a database such as available at the NIH Molecular 
Modelling Homepage, accessible at 
http://cmm.info.nih.gov/modeling/pdb at a glance.html . 

The insCTtion and acceptor sequences can be selected from any of the domain 
sequences described above and can be of like kind (e.g., both catalytic sites, both 
binding domains, both light emitting domains) or of different kmd (e.g., a catalytic 
site and a binding site, as shown in Figure 1 C; a binding site and a light emitting 
domain; etc.). The domain sequences can be the minimal sequences required to 
confer a state or activity or can comprise additional sequences. Other insertion and 
acceptor sequences can be derived from known domain sequences or from newly 
identified sequences. Such sequences are also encompassed within the scope of the 
instant invention. 

Exemplary Fusion Molecules 

In one aspect, the insertion sequence or the acceptor sequence localizes the 
fusion molecule intracellularly. Preferably, intracellular localization is coupled to the 
binding of the fusion molecule to a bio-effective molecule. 

In another aspect, the invention provides a fusion protein comprising an 
insertion sequence and an acceptor sequence, wherein either the inserted sequence or 
the acceptor sequence binds to a DNA molecule, and wherein DNA binding activity.is 
coupled to the response of die respective other sequence of the fusion molecule to a 
signal. 

The fusion molecule also can con[q>rise an insertion sequence and acceptor 
sequence, wherein ei&er the inserted sequence or the acceptor sequence associates 
with a bio-effective molecule, and disassociates from flie bio-eflfective molecule, 
when die respective other sequence of the fusion binds to a cellular marker of a 
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pathological condition. Such markers can comprise polypeptides, nucleic acids, 
glycoproteins, lipids, carbohydrates, small molecules, metabolites, pH, ions and the 
like. Examples of cellular maikers of pathological conditions include, but are not 
limited to cancer-specific or tumor-specific antigens, pathogen-encoded polypeptides 
5 (e.g., viral-, bacterial-, protist-, and parasite-encoded polypeptides) as are known in 
the art. 

In still another aspect, the fusion molecule is capable of switching from a non- 
toxic state to a toxic state. Either the insertion sequence or acceptor sequence may 
bind to a cellular marker of a pathology (e.g., such as a tumor antigen). Binding of 
10 the marker to the fusion protein switches the fusion protein from a non-toxic state or a 
less toxic state to a toxic state . Similarly, a marker of a healthy cell could be used as 
a trigger to switch a fusion molecule from a toxic state to a non-toxic state, or to a less 
toxic state. 

hi a further aspect, the fusion molecule comprises a molecular switch for 
15 controlling a cellular pathway. The fusion molecule comprises an insertion sequence 
and an acceptor sequence and the states of the insertion sequence and acceptor 
sequence are coupled, such that the state of either the insertion sequence or the 
acceptor sequence modulates the activity or expression of a molecular pathway 
molecule in a cell. Preferably, modulation of activity or expression occurs when the 
20 respective other portion of the fusion molecule responds to a signal, e.g., binds to an 
exogenous or endogenous binding molecule (e.g., ligands, small molecules, ions, 
metabolites, and the like), responds to electrical or chemical properties of a cell, or 
responds to the optical enviroimient in which a cell is found (e.g., responding to the 
presence or absence of particular wavelength(s) of light). 

25 The invention also provides a sensor molecule comprising an insertion 

sequence and an acceptor sequence, wherein either the insertion sequence or acceptor 
sequence binds to a target molecule and wherein the respective otfier sequrace 
generates a signal in response to binding. Preferably, the acceptor sequence 
comprises a deletion and/or duplication at the insertion site. 

30 It should be obvious to those of skfll in the art that these are only exemplary 

combinations of insertion and acceptor sequences that can be used. 
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Additional Sequences 

Fusion molecules can comprise domain sequences in addition to insertion and 
acceptor sequences. Such domains can comprise states which may or may not be 
coupled with the states of the other portions of flie fusion molecule. 

Additional sequences also can be included as part of the fusion molecule 
which do not alter substantially the states of the insertion sequence or acceptor 
sequence portion of the fusion molecule. For example, affinity tag sequences can be 
provided to facilitate the purification or isolation of the fusion molecule. Thus, His6 
tags can be employed (for use wifli nickel-based affinity columns), as well as epitope 
tags (e.g., for detection, immunoprecipitation, or FACs analysis), such as myc, BSP 
biotinylation target sequences of the bacterial enzyme BirA, flu tags, lacZ, GST, and 
Strep tags I and E. Nucleic acids encoding such tag molecules are commerciaUy 
available. 

Stability sequences can be added to flie fusion molecule to protect the 
molecule ftom degradation (e.g., by a protease). Suitable stability sequences mclude, 
but are not limited to. glycine molecules incorporated after the initiation methionine 
(e.g., MG or MGG) to protect tbs fusion molecule firom ubiquitination; two pralines 
incorporated at the C-teiminus (conferring protection against carboxypeptidase 
action), and the like. 

Li some aspects, flie fusion molecule can include a linking or tefliering 
sequence between insertion and acceptor sequences or between insertion or acc^tor 
sequences and other domain sequences. For example, useful linkers include glycine 
polymers, glycine-serine polymers, glycine-alanine polymers, alanine-serine 
polymers, alanine polymers, and o&er flexible linkers as are known in the art (see, 
e.g., Huston, at al., 1 988, Proc. Natl: Acad. Set USA 85: 4879; U.S. Patent No. 
5,091,513). 

These additional sequences can be included to optimize flie properties of the 
fusion molecules described herein. 
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Generating Fusion Molecules Comprising Domain Insertions 

In one aspect, libraries in which ah insertion sequence has been randomly 
inserted into an acceptor sequence are constructed. Preferably, such libraries are 
generated by randomly inserting a nucleic acid fragment encoding an insertion 
5 sequence into a nucleic acid fragment encoding an acceptor sequence. 

All existing methods for random insertion can be categorized into one of two 
strategies: insertion via transposons and insertion after a random double stranded 
break in DNA using one or a combination of nucleases. A variety of transposons 
have been used to deliver short, in-frame insertions of 4-93 amino acids (e.g., Hayes 

10 and Hallet, 2000, Trends Microbiol 8: 571-7; Manoil and Traxler, 2000, Methods 20: 
55-61). However, although transposons are an efficient method for delivering an 
msertion, insertion methods are preferred which create libraries with direct insertions, 
deletions at the insertion site, or variabihty in the amount deletions or tandem 
duplication or variability in the distribution of direct insertions, deletions and tandem 

15 duplications. 

Random insertion using nuclease treatment, on the oflier hand, can create such 
libraries. These methods typically are used for the insertion of short sequences into a 
target gene during linker scanning mutagenesis. These methods generally differ in the 
strategy used to produce a random, double-strand break in supercoiled plasmid DNA 
20 containing the gene to be inserted. 

A number of different strategies can be used to create the frision molecules of 
the instant invention. These include, but are not limited to: (a) limited digestion with 
DNasel in the presence of Mn^* to produce a single double stranded break (Hefifron, et 
al., 1978, Proc. NatL Acad. ScL USA75\ 6012-6016); (b) limited digestion with 

25 DNasel in the presence of Mg^* to produce a smgle nick followed by S 1 nuclease 

treatment to cleave opposite the nick (Dykxhoom, et al., 1 997, Nucleic Acids Res. 25: 
4209-18) ; (c) limited digestion with DNasel wifli Mg^* under conditions for nick 
translation to take place, followed by SI nuclease treatment to cleave opposite the 
nick; and (d) partial apurination with formic acid and exonuclease m, which 

30 introduces a single strand gap at the ^urinic site, followed by SI nuclease treatment 
to cleave opposite the gap (Luckow, et al., 1 987, Nucleic Acids Res. 75:41 7-429 
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(1987) summarized in Figure 2B. In metbod (b), the location of the double strand 
break is determined by the location of the DNasel nicking whereas in method (c) the 
location of the double strand break is determined by how far nick translation has 
progressed. In addition to digestion by nucleases (e.g., DNAse, Si, exonucleases, 
5 restriction endonucleases and the like), other methods for introducing breaks in 
sequences can be used. For example, mechanical shearing, chemical treatment, 
and/or radiation can be used. Generally, the method for introducing breaks is not 
intended to be limiting. 

In a particularly preferred aspect, libraries of fusion molecules are generated 
10 using incremental truncation (see. Patent Application by Ostemieier, "hicrementally 
Truncated Nucleic Acids and Methods of Making the Same", Attorney Docket No., 
741 8/79492). As shown in Figures 2C, a key step in the creation of these hT)raries is 
the digestion of the gene fiagments with a 3' to 5' exonuclease such as Exonuclease 
in (Exo ni) under conditions (e.g., low temperature or in the presence of NaCl) such 
1 5 that flie digestion rate is controlled to ~1 0 bases/minute or less. During Exo m 

digestion, small aliquots are removed fiequently and quenched by addition to a low 
pH, higji salt buffer. Blunt ends are prepared by treatment with a single-strand 
nuclease and a DNA polymerase followed by unimolecular Kgation to recyclize the 
vector. As Exo in digests DNA at a substantially uniform and synchronous rate (Wu, 
20 et al., 1 976, Biochemistry 15: 734-740), this allows flie creation of a hbrary 
comprising every possible one base pair deletion of a gene or gene fragment. 

Constructing a Target Vector Comprising Acceptor Sequences 

In one aspect, construction of a Ubraiy comprises the initial step of 
constructing and testing a target vector, i.e., a vector comprising a nucleic acid 
25 encoding an acceptor sequence. For example, a gene or gene fragment which encodes 
a polypeptide is cloned into a vector, such as a plasmid. Preferably, the polypeptide 
exists in a state at least under certain conditions, i.e., comprises an activity, can bind a 
molecule, exist in a conformation, emit light, transfer electrons, catalyze a substrate, 
etc. under those conditions. 

Preferably, flie plasmid comprises a reporter sequence for monitoring the 
efficacy of the cloning process. Suitable reporter genes include any gene that 
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expresses a detectable gene product which may be RNA or protein. Examples of 
reporter genes, include, but are not limited to: CAT (chloramphenicol acetyl 
transferase); luciferase, and other enzyme detection systems, such as P-galactosidasQ, 
firefly luciferase, bacterial luciferase, phycobiliproteins (e.g., phycoerythrin); GFP; 
5 alkaline phosphatase; and genes encoding proteins conferring drug/antibiotic 
resistance, or which encode proteins required to complement an auxotrophic 
phenotype. Other useful reporter genes racode cell surface proteins for which . 
antibodies or ligands are available. Eq>ression of the reporter gene allows cells to be 
detected or affinity purified by the presmce of the surface protein. 

1 0 The reporter gene also may be a fusion gene that includes a desired 

transcriptional regulatory sequence, for example, to select for a fusion molecule 
whose switehing functions include the ability to modulate transcription. 

Generation of Insertion Sequences 

Nucleic acids encoding polypeptide insertion sequences can be obtained via a 
15 number of routes, including, but not limited to one or more of: amplification (e.g., 
uaing primers which flank a nucleic acid sequence encoding a domain of interest), 
reverse transcription, cloning, and chemical synthesis. 

In one aspect, a nucleic acid can be amplified using primers designed to 
provide convenient restriction sites or promoter sequences for further cloning steps. 
20 This nucleic acid can be cloned into a vector and digested with restriction 
endonucleases as in Figure 2A to produce flie desired insertion sequence. 

Construction of Random hisertion Libraries 
In one aspect, a target vector comprising the nucleic acid encoding the 
acceptor polypeptide is randomly linearized (see. Figure 2B and 2C). A variety of 
25 different nucleases and digestion schemes can be used. For example, flie vector may 
be exposed to DNase/Mn^^ digestion followed by polymerase/ligase repair; SI 
nuclease digestion followed by polymerase/ligase repair; and SI nuclease digestion 
which is not repaired. The three schemes.differ in (a) the methods used to cr^te the 
random double-stranded break in the target plasmid and (b) whether or not the nucleic 
30 acid (e.g., DNA) is repaired by polymerase/ligase treatment, or other methods. 
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However, it should be obvious to those of skill in the art that any method of 
introducing breaks into a DNA molecule can be used (e.g., such as digestion by mung 
bean nucleases, endonucleases, restriction enzymes, exposure to chemical agents, 
irradiation, and/or mechanical shearing) and that the methods of introducing breaks 
5 described above are not intended to be limiting. 

Preferably, digestion is controlled such that a significant fraction of DNA is 
undigested in order maximize the amount of linear DNA that only has one double 
strand break (see, e.g.. Example 1, Table 2). Key features for optimizing DNase I 
digestion include the use of Mg^"^ free DNasel (Roche Molecular Biochemicals), a 
10 digestion temperature of 22 °C and 1 mM Mn^"" instead of Mg^^ to increase flie ratio 
of double strand breaks to nicks (see, e.g., as described in Campbell and Jackson, 
1 980, J. Biol Chem 255: 3726-35). 

The DNA can be repaired using methods known in the art, for example, usmg 
T4 DNA ligase and T4 DNA polymerase (see, e.g., Graf and Schachman, 1996, Proc. 
15 Natl. Acad. Sci. USA 93: 11591-11596) and dephosphoiylated. Ligation with nucleic 
acids encoding the insert is perfomied and the collection of nucleic acids (e.g., library 
member). 

Incremental truncation libraries can be used to examine all possible insertion 
points wifliin a given region of an acceptor molecule (see. Figure 2C). Incremental 
truncation used wifliin the context of the present invention is a combinatorial solution 
to identifying active, bisected proteins that would be difficult to predict a priori. 
Libraries can be recombined in vitro by methods such as DNA shuffling (Stemmer, 
1994, Proc. Natl Acad, Sci. USA 91: 10747-10751) to explore new areas of sequence 
space (see, e.g., Lutz, et al., 2001, Proc. Natl Acad. Set USA 98: 1 1248-1 1253). 

Preferably, random insertion libraries according to the invention comprise at 
least about lO'^-lO^ library members. More preferably, insertion libraries comprise at 
least two times the number of base pairs in a target nucleic acid (e.g., a nucleic acid 
comprising acceptor DNA and oflier vector sequences). More preferably, a library 
comprises one or more of: deletions at the insertion site and duplications at fee 
insertion site, as well as direct insertions with neither duplications nor deletions. 
Generally, library members may comprise small deletions or tandem duplications on 
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the order of at least about 1-20 bases; however, larger duplications or deletions on the 
order of about half the length of a gene also may be tolerated and/or desirable. 

Evaluation of Insertion Libraries: Identification of Fusion Molecules 

In one aspect, transformants are selected which express a reporter gene 
5 included in the target vector, such as a drug resistance gene to initially screen for 
fusion molecules. Alternatively, or additionally, transformants can be selected in 
which flie state of the insertion sequence is coupled to the state of the acceptor 
sequence (see, e.g.. Figure 2D). Thus, in one aspect, the existence of each state is 
assayed for, as is the dependence of each state on existence of one or more other 
10 states. States may be assayed for simultaneously, or sequentially, in flie same host 
cell or in clones of host cells. Fusion molecules also can be isolated from host cells 
(or clones thereof) and their states can be assayed for in vitro. 

For example, in one aspect, the enzymatic activity of an insertion sequence or 
acceptor sequence is assayed for at the same that the binding activity of the respective 
1 5 other portion of the fusion is evaluated (see, e.g., as described further in Example 1 , 
and Table 2) to identify fusion molecules in which enzymatic activity is dependent on 
binding activity. 

In another aspect, fiision molecules are screened for which bind to a moleciile, 
such as a bio-effective molecule (e.g., a drug, therapeutic agent, toxic agent, agent for 
20 affecting cellular physiology). The bound fusion molecule is e7q)osed to a cell, and 
the ability of the fusion molecule to be localized intracellularly is determined. 
Preferably, release of the bio-effective molecule in response to intracellular 
localization also is determined. 

For example, a cell can be transiently permeabilized (e.g., by exposure to a 
25 chemical agent such as Ca^^ or by electroporation) and exposed to a fusion molecule 
associated with the bio-effective molecule (e.g., bound to the bio-effective molecule), 
allowing the fusion molecule and bound molecule to gain entry into the cell. The 
ability of the fusion molecule to localize to an intracellular compartment (e.g., to the 
endoplasmic reticulimi, to a lysosomal compartment, nucleus, etc.) along with the bio- 
30 effective molecule can be monitored ftirough the presence of a label (e.g., such as a 
fluorescent label or radioactive label) on the fusion molecule, bio-effective molecule, 
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or both. The label can be conjugated to the fusion molecule and/or the bio-effective 
molecule using routine chemical methods known in the art. A label also may be 
provided as part of an additional domain of the fusion molecule. For example, &e 
fusion molecule can comprise a GFP polypeptide or modified form thereof The 
5 localization of the label (and hence the fusion molecule and/or bio-effective molecule) 
can be detemiined using light microscopy. Release of the bio-efifective molecule can 
be monitored by lysing the ceD, immunoprecipitating the fiision molecule, and 
detectmg the amount of labeled bio-e£fective molecule in the precipitated fiaction. 

In one aspect, the cell need not be permeabilized to allow entry of the fusion 
1 0 molecule because the fusion molecule comprises signal sequence fliat enables the 
fusion molecule to traverse the cell membrane. Intracellular transport of the bio- 
effective molecule can be monitored by labeling the bio-effective molecule and 
examining its locaUzation using light microscopy, FACs analysis, or other methods 
routine in the art 

15 In another aspect, insertion libraries are screened for fusion molecules which 

comprise an insertion sequence or acceptor sequence which associates with a bio- 
effective molecule and which releases the bio-effective molecule when the respective 
other portion of the fusion binds to a cellular marker of a pathological condition. 
Thus, in one aspect, fusion molecules associated with a bio-efifective molecule are 

20 contacted with cells expressing such a marker and the ability of the fusion molecules 
to specifically bind to the ceU is assayed for, as well as the ability of the fusion 
molecule to release the bio-effective molecule in response to such binding. For 
example, as above, either, or both, the fusion molecule and the bio-effective molecule 
can be labeled and the localization of the molecules detemiined. The action of the 

25 bio-effective molecule also can be monitored (e.g., the effect of the bio-effective 
molecule on the cell can be monitored). 

In a preferred aspect, flie insertion Kbrary comprises members in which the 
insertion or acceptor sequence comprises the human serum transferrin (HST) Hansport 
domain while the respective other portion of the fusion comprises a binding domain 
30 for binding to an anti-cancer drug. In one preferred aspect, the binding domain 
comprises the methotrexate-binding domain of the dihydrofolate reductase 
polypeptide (DHFR). At least two methods for the identification of fusions with the 
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desired activity can be used. In the first, a DHFR-HST library is displayed on the 
surface of phage and panned against methotrexate inunobilized on a solid phase such 
as agarose. Fusions are selected for which bind the drug in the presence of iron at 
physiological pH (7.4), but which release mefliotrexate when HST releases its iron in 
amildly acidic wash. After each round of selection, the Ubraiy will be sampled and 
DHFR activity at physiological and acidic pH will be measured in order to evaluate 
fusion molecules selected. 

The second strategy takes advantage of selective inhibition of bacterial DHFR 
by the antibacterial drug trimethoprim. E. colt cannot grow in tiie presence of 
trimethoprim unless the bacteria is repressing a functional mammalian DHFR. 
Therefore, in a first step, a non-phage display library of DHFR-HST fusions is 
expressed in K coli and those fusions that exhibit DHFR activity is selected by 
growth on plates at physiological pH containing trimethoprim. Assuming that DHFR 
activity correlates wifli methotrexate binding and tiiat conformational changes in the 
DHFR-HST fusion that disrupt trimethoprim binding also disrupt metholr^te 
binding, tiiose colonies selected in the first step are screened for no growth on plates 
at acidic pH containing triinethoprim in order to identify fusions with the ability to 
release methotrexate at acidic pH. 

In still another aspect, insertion libraries are screened for fusion molecules 
which can switch from a non-toxic state to a toxic state upon binding of the insertion 
sequence or acceptor sequence to a cellular marker of a pathology. As above, fusion 
molecules can be selected which specifically bind to ceUs expressing the marker and 
the affect of the fusion molecules on cell death can be assayed for. Cell death can be 
monitored using methods routine in the art, including, but not limited to: stainmg cells 
with vital dyes, detecting spectral properties characteristic of dead or dying cells, 
evaluating the morphology of the cells, examining DNA fragmentation, detecting the 
presOTce of proteins associated wifli ceU death, and the like. Cell death also can be 
evaluated by detennining the LD50 or LC50 of the fusion molecule. 

In a further aspect, the insertion library is screened for fusion molecules which 
comprises a molecular switch for controlling a cellular pathway. Preferably, the states 
of the insertion sequence and acceptor sequence iii the fusion molecules are coupled 
and responsive to a signal such that in the presence of the signal, the state of either the 
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insertion sequence or the acceptor sequence modulates the activity or expression of a 
molecular pathway molecule in a cell. A signal can be flie presence, absence, or level, 
of an exogenous or endogenous binding molecule to which either the insertion 
sequence or acceptor sequence binds, or can be a condition (e.g., chemical, optical, 
5 electrical, etc.) in an environment to which the fusion molecule is e^osed. The 

abiUty of the fusion molecule to control a pathway can be monitored by examining the 
expression and/or activity of pathway molecules which act downstream of a pathway 
molecule whose expression and/or activity is being modulated. 

In another aspect, fusion molecules are selected in which ei&er the insertion 
10 sequence or acceptor sequence binds to a nucleic acid molecule. For example, the 

ability of fusion molecules to bind to a nucleic acid immobilized on a soMd phase can 
be monitored (e.g., membrane, chip, wafer, particle, slide, column, microbead, 
miax>sphere, capillary, and the like). Preferably, fusion molecules are selected in 
which nucleic acid binding activity is coupled to a change in state of flie respective 
1 5 other sequence of the. fusion molecule. For example, nucleic acid bindmg activity can 
be coupled to the binding activity of another portion of the fusion molecule, catalysis 
by the other portion, the light emitting function of the other portion, electron 
transferring ability of the other portion, ability of the other portion to change 
conformation^ and the like. Preferably, nucleic acid binding activity is coupled to the 
20 response of the fusion molecule to a signal. 

Nucleic acid binding activity also can be monitored by evaluating the activity 
of a target nucleic acid sequence to which the fusion molecule binds. For exanq>le, in 
one aspect, the fusion molecule binds to a nucleic acid regulatory sequence which 
modulates the activity (e.g., transcription,, translation, replication, recombination, 

25 supercoiling) of another nucleic acid molecule to which the regulatory sequence is 

operably linked. The nucleic acid regulatory molecule and its regulated sequence can 
be provided as part of a nucleic acid molecule encoding the fusion molecule or can be 
provided as part of separate molecule(s). The nucleic acid binding activity can be 
monitored m vitro or in vivo. The ability of fusion molecules to bind to a nucleic acid 

30 can also be determined m vivo using one-hybrid or two-hybrid systems (for example, 
see, Hu, et al., 2000, Methods 20: 80-94. 
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In certain aspects, fusion molecules are selected which bind to a known 
regulatory sequence or a sequence naturally found in a cell. In other aspects, a 
sequence which is not known to be a regulatory sequence in a cell is selected for. 
Preferably, such a sequence binds to the fusion molecule and modulates the activity of 
5 another nucleic acid (in cis or in trans). Thus, the fusion molecule can be used to 
select for novel nucleic acid regulatory sequences. Preferably, the fusion molecule 
modulates the regulatory activity of the nucleic acid molecule in response to a signal, 
as described above. 

In still a further aspect, the insertion library is screened for fusion molecules 
10 which are sensor molecules. Preferably, fusion molecules are screened for in which 
either the insertion sequence or acceptor sequence binds to a target molecule and . 
wherein the respective other portion of tide fusion molecule generates a signal in 
response to binding. Signals can include: emission of light, transfer of electrons, 
catalysis of a substrate, binding to a detectable molecule, and the like. To assay for 
15 such fusions, members of the library can be screened in the presence of the target 
molecule (e.g., in solution, or immobilized on a solid support) for the production of 
the signal. 

Evaluation of Structitre: State Relationships in Fusion Molecules 

Preferably, random library members having desired states are sequenced to 
20 precisely identify the sequence of the fusions at the insertion site. More preferably, 
all library members having desired states are sequenced. Sequence information can 
be correlated with the ability of different portions of the fusion molecule to Tnainia^f^ 
one or more states and to respond to one of more signals. A plurality of active 
insertion points, and preferably, all possible insertion points, can be mapped onto a 
25 crystal structure of the acceptor sequence (e.g., such as an acceptor polypeptide). 
Sites of insertion that produce allosteric control can be compared to sites in the 
acceptor molecule predicted to be allosterically linked to a signaling molecule (e.g., 
such as a binding molecule or ligaad) by comparisons of the structures of acceptor 
molecule in the presence or absence of the signaling molecule (see, e.g., Star2yk, et 
30 al., 1989, J8/pc//em£y/ry 25; 8479-8484). 
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In another aspect, non-iiinctional fusion molecules also are sequenced to 
detennine structures which are not ^)propriate to maintain particular states and/or 
respond to signals. 

In a further aspect, fusion molecules are mutagenized to identify molecular 
switches with optimal properties. Preferably, the sequence of such molecules also are 
determined. In one aspect, "first round switches" are identified by screening a library 
of domain insertions and optimized to select for "second round switches" with 
improved properties. For example, combinatorial (e.g., error-prone PGR, DNA 
shuffling, etc) and/or rational methods can be used to select for switches with 
increased activity, stability, and/or improved switching capacity (e.g., ability to 
respond to a wider or narrow range of signal). Preferably, second round switches are 
also sequenced to identify sequence aIta:ations associated with improved properties. 

Conditional Heterodimerization 

Many proteins can have their peptide backbone cut by proteolytic or genetic 
means, yet the two fragments can associate to make an active heterodimer. This 
phenom»ion of "monomer to heterodimer conversion" is referred to as protein 
fragment complonentation. Howev^, there are many locations where such a 
conversion it is not feasible, presumably due to inefficient assembly or improper 
folding of the fi^agments. This can be overcome by fusion of the fragments to 
dimerization domains to facilitate correct assembly. Such "assisted protein 
reassembly" has been shown far a few proteins (Pelletier, et al., 1998, Proc. Natl. 
Acad. Sci. USA £5: 12141-12146; Spencer, etal., 1993, Science 262: 1019-24; 
Michnick, etal., 2000, Methods Enzymol 328: 208-30; Remy and Michnick, 1999, 
Proc. Natl. Acad Sci. USA 96: 5394-5399. 7620; Remy, et al., 1999, Science 283: 
990-993; Ghosh, et aL, 2000, J. Am. Chem. Soc. 122: 5658; Johnson and Varshavsky, 
1994, Proc. Natl. Acad. Sci. USA 91, 10340^10344; Karimova, et al., 1997, Proc. 
Natl. Acad. Sd. USA 94: 8405-8410; Rossi, et all, 2000,Methods Enzytnol. 328: 231- 
51). However, thus far, such methods have been used exclusively in two-hybrid 
system to evaluate protein-protein interactions (Remy and Micknick, 1 999, supra; 
Amdt, et al., 2000, J. Mol. Biol. 295: 627-39; Pelletier, et al., 1999, Nat Biotechnol 
17: 683-90; Mossner, et al., 2001, y. Mol Biol 308: 1 15-22) and have not been 
exploited to generate molecular switches. 
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The invention provides a pair of fusion molecules comprising a first portion 
and second portion. ITie first and second portions represent the fragments of a 
bisected polypeptide which cannot function or exist in a particular state unless bofli 
portions are rought into sufficient proximity. Preferably, each portion is fused to an 
oligomerization omain (see. e.g.. Figure IB, Figures 5A-C, and Example 2 below) 
thereby generating a pair of fusion molecules. Unlike the protein fiagment 
complementation systems described in the prior art, the fusion molecules according to 
the invention oligomerize only in the presence of a signal, providing a means to 
switch ON the activity/state of the polypeptide in the presence of the signal. Suitable 
signals include any described above for domain insertion fusion molecules. 

Suitable oligomerization motifs include, but are not limited to, dimerization 
motifs such as the LexA dimerization domain (Golemis and Brent, 1992, Mol. Cell 
Biol. 12: 3006), lambda cl dimerization domain, leucine zipper dimerization domains 
(e.g., such as firom GCN4 leucine zippers, antiparallel leucine zippers, p21 , and the 
like), ras GTPase/ras-binding domain, FADD/FAS dimerization domains, EGF 
receptor dimerization domains, Ihe FKBP/FRAP dimerization domains, the 
tetramerization domain of p53, and the tetranaerization domain of BCR-ABL. In 
addition; the art also provides a variety of techniques for identifying other naturally 
occurring oligomerization domains, as weB as oligomerization domains derived firom 
mutant or artificial sequences (see, e.g., Zeng et al., 1997, G&ie 185: 245). 

In a preferred aspect, leucine zippers are used as dimerization domains to 
assemble firagments of a polypeptide. Each domain of a leucine zipper is relatively 
simple, comprising an approximately 30 amino acid helix. Furflier, dependmg on 
their sequence, leucine zippers can dimerize in a paraUel or antiparaUel configuration, 
thus offering two distinct geometries for re-assembly of an active polypeptide. Both 
parallel and antiparallel leucine zippers have been ^own to assist flie reassembly of 
fiagments of proteins. Because much is known about the interactions that stabilize 
dimerization, zippers of different affinity are readfly available. Finally, leucine 
zippers have been shown to be expressed well in E. colL 

In one preferred aspectj oligomerization occurs on binding of the 
oligomerization domains to a small molecule, such as a CID. A CID is a synthetic 
ligand having two binding surfaces that facilitate the dimerization of domains fused to 
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target proteins (see, e.g.. Spencer, et al., 1993, Scimce 262: 1019-24; Rivera, etaL, 
1998, Methods 14: 421-9). CIDs have been used to facilitate the dimerization of 
domains fused to target proteins. CIDs also have been used to initiate signaling 
pathways by dimerizing receptors on the cell surface, to translocate cytosolic proteins 
5 to the plasma membrane, to import and export proteins from the nucleus, to induce 
apoptosis, and to regulate gene transcription (Farrar, et al., 2000, supra\ Bishop, et al., 
2000, Annu. Rev. BiopJtys. Biomol Struct. 29: 577-606. However, CIDs reported in 
the art have not been used as switches to activate previously inactive proteins in cells. 

Suitable CIDs for use in the present invention include, but are not limited to: 
10 the immunosupressant FK506 (Spencer, et al., 1 993, supra)i coumermycin (which 
induces dimerization of GyrB-containing fusion proteins) (see, Fairar, et al., 2000, 
Methods Eiizyniol 327: 421-9), and rapamycin. Novel CID's can be screened for 
using combinatorial libraries to identify molecules capable of inducing 
oligomerization of oligomerizing domains. 

* 5 Types of proteins which can be bisected generally can include any of the 

domains described above as suitable for insertion sequences or acceptor sequences. 
In one aspect bisected molecules include, but are not limited to: dihydrofolate 
reductase (DHFR) (Pelletier, et al., 1998, Proc. Natl Acad, Set USA 95;: 12141- 
12146; Remy, et al., 1999, Proc. NalL Acad Scl USA 06: 5394-5399; Remy, et al., 

20 1009, Science 283: 990-993); E. coli glycinamide ribonucleotide transformylase 

(PurN) (Michnick, et al., 2000, supra)i green fluorescent protein (Ghosh, et al., 2000, 
J. Am. Chent. Soc. 122: 5658), ubiquitin (Johnson and Varshavsky, 1994, Proc. Natl 
Acad. ScL USA 91: 10340-10344; Karimova, et al., 1998, Proc. Natl Acad. Set USA 
95: 5752-6), B^galactosidase (Rossi, et al., 1997, Proc, Natl Acad Scl USA 94: 8405- 

25 8410; Rossi, et al., 2000, Methods Enzyntol 328: 231-51); aminoglycoside and 

hygromycin B phosphotransferases (Michnick, et al., 2000, supra), as these have been 
shown to be tolerant of bisections. 

Fusion molecules additionally may comprise flexible linkers, stabilizing 
sequences, aflBnity sequences, and the like, as described above. 

30 In contrast to reassembled proteins described in the art, the conditional 

heterodimers of the invention niay include duplicated residues and/or deletions at the 
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site of bisection. As shown in Figure 7B, in one aspect, Ubraries comprising the 
heterodimers may have small to large duplications and/or deletions in both nucleic 
acid fragments encoding the respective portions of the bisected polypeptide^ 
increasing the diversity of molecules which may be evaluated for switching function, 
5 Further, unlike reassembled proteins described in the art, linker sequences are not 

required between the dimerization domain and the bisected portion of the polypeptide. 
Therefore, in one aspect, the invention provides a fusion molecule comprising a 
portion of a bisected polypeptide fused to an oligomerization domain, wherein the 
fusion molecule does not comprise a linker sequence and flie oligomerization domain 
1 0 is responsive to a signal. Preferably, the response of the oligomerization domain to 
the signal brings respective portions of the bisected polypeptide together. 

In another aspect, the invention provides a pair of fusion molecules which 
each comprise respective portions of a bisected polypeptide fused to oligomerization 
domains, wherein the respective portions of the bisected polypeptide are encoded by 
15 nucleic acids comprising a duplication or deletion at the bisection site. 

Generation of Conditional Heterodimers 

The strategy for generating pairs of fusion molecules for forming conditional 
heterodimers is illustrated in Figures 6A-B. In the example shown in the Figures, a 
polypeptide comprising an activity (e.g., such as an enzymatic activity) is 

20 systematically bisected by fragmenting a gene encoding Ae polypeptide to generate a 
plurality of bisected polypeptides. Preferably, all possible bisections are represented. 
In subsequent, or the same cloning steps, nucleic acids encoding oligomerization 
sequences are ligated in frame to the nucleic acids encoding the plurahty of bisected 
polypeptides. Pairs of fusion molecules so generated are screened for fliose which are 

25 able to dunerize (e.g., restoring the activity of the bisected polypeptides). 

In one aspect, iiicremental truncation is used to engineer a conditional 
heterodimer. In tiie example for implementing this approach, shown in Figures 6A-B, 
two overlapping fragments of a gene encoding a polypeptide whose state is to be 
switched are cloned mto vectors. Incremental truncation libraries from the 3' end of 
30 the 5* fragment and the 5' end of the 3' fragment are prepared using time-dependent 
exonuclease digestion (Ostermeier, et al., 1999, Proc. Natl Acad, Sd, USA 96' 3562- 
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3567) or a-phosphothioate nucleotide incoiporation (Lutz, et al., 2001^ Nucleic Acids 
Res. 29: el 6) to generate linear fragments. Preferably, as with domain insertion 
libraries, these libraries comprise deletions and/or duplications at the insertion site. 

To avoid the jjossibility that individual fragments are active on their own, the 
5 starting firagments preferably are designed such that they lack essential residues for 
fimctionality (e.g., such as residues at fee N-terminal encoding portion or C-terminal 
encoding portion of the fi:agments). After truncation, vectors are recircularized such 
that the 3' truncated firagment is fused to stop-codons in all three reading fi^es and 
the 5* truncation is fiised to an ATG start codon. Separate libraries of 5' and 3' 
10 digested firagments are introduced into E. coli at concentrations fliat will maximize co- 
transformation of the 5' and 3' fragments, i.e., providing the potential to detect pairs of 
frision molecules which dimerize in response to a signal. Nucleic acids encoding 
oligomerization domains (e.g., such as dimerization domains) can be linked to the 
firagments before or after or during the creation of the truncation libraries (e.g., by 
15 oligo assembly or by PGR). Preferably, the oligomerization domains are responsive 
to a signal. The abiKty of cells to recover polypeptide activity in the presence or 
absence of flie oligomerization domain, and in the presence or absence of signal, is 
monitored. 

Cells exhibiting protein activity in the presence of signal are identified and the 
vectors expressing the respective halves of the polypeptide are sequenced. In one 
aspect, pairs of fiision molecules exhibiting the highest degree of activity are selected 
as targets for directed evolution. For example, gene fi:agments can be amplified by 
error-prone PGR (Galdwell and Joyce, 1 995, in PCR Primer: A Laboratory Manual, 
Cold Spring Harbor Laboratory Press, Plainview, NY) such that on average each 
DNA molecule has one missense mutation. Such 5' and 3* gene fragments are again 
co-transformed and cells are selected which express the same or higher levels of 
activity. Preferably, ceUs that express higher levels of activity are identified (e.g., at 
least about 2-fold higher activity). Rescued constructs are sequenced to identify the 
nature of flie mutation and to verify that mutations are not creating fragments whose 
encoded polypeptides oligomerize even in the absence of an ohgomerization domain. 

In one aspect, after identifying pairs of fiision molecules whose activity can be 
restored through oligomerization, the oligomerization domains of tiiese pairs are 
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exchanged for oligomerization domains which are responsive to a signal (e.g., where 
original domains where not responsive to a signal) or which respond to a different 
signal from one recognized by domains used to create the original fusion molecules. 

Expression Vectors For EjqfressUig Fusion Molecules 

5 Identification of desired fusion molecules, wheflier domain insertions, or 

conditional heterodimers, can be facilitated by flie use of expression vectors in 
creating the libraries described above. Such expression vectors additionaUy can be 
useful for generating large amounts of fusion molecules (e.g., for delivery to a cell, or 
organism, for use in vitro or in vivo). 

10 Thus, in one aspect, library members comprise regulatory sequences (e.g., 

such as promoter sequences) which can be either constitutively active or inducible 
which are operatively linked to acceptor sequences comprising insertion sequences. 
Regulatory sequences can comprise promoters and/or enhancer regions from a single 
gene or can combine regulatory elements of more than one gene. In a preferred 

1 5 embodiment, the regulatory sequences comprise strong promoters which allow high 
expression in cells, particularly in mammalian cells. For example, the promoter can 
comprise a CMV promoter and/or a Tet regulatory element. 

Library members also can comprise promoters to facilitate in vitro translation 
(e.g., T7, T4, or SP6 promoters). Such constructs can be used to produce amounts of 
20 fusion molecules in sufficient qujantity to verify initial screening results (e.g., the 
ability of the molecules to function as molecular switches). 

The expression vectors can be self-replicating extrachromosomal vectors 
and/or vectors which integrate into a host genonie. In one aspect, the expression 
vectors are designed to have at least two replication systems, allowing fliem to be 
25 repHcated and/or expressed and/or integrated in more than one host cell (e.g., a 
prokaryotic, yeast, insect, and/or mammalian cells). For example, fee expression 
vectors can be replicated and mamtaiaed in a prokaryotic cell and then transferred 
(e.g., by transfection, transformation, electroporation, microinjection, cell fusion, and 
the like) to a mammalian cell. 
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The expression vectors can include sequences which facilitate integration into 
a host genome (e.g.. such as a mammalian cell). For example, fte e;q>iession vector 
can comprise two homologous sequences flanking the nucleic acid sequence encoding 
the fusion molecule, facilitating insertion of flie nucleic acid expressing the fusion 
molecule into the host genome through recombination between the flanking sequences 
and sequences in the host genome. Sequences such as lox-cre sites also can be 
provided for tissue-specific inversion of the fusion molecule nucleic acid with respect 
to a regulatory sequence to which the fusion molecule nucleic acid is operably linked. 

Integration into the host genome may be monitored by screening for the 
expression of a reporter sequence included m the expression vector, by the expression 
of the unique fosion molecule (e.g., by monitoring transcription via Northern Blot 
analysis or translation by an immunoassay), and/or by the presence of ihe switching 
activity in the cell. 

Host Cells For Expressing Fusion Molecules 

Fusion molecules according to the invention can be expressed in a variety of 
host cell, including, but not limited to: piokaryotic cells (e.g., E. coli. Staphylococcus 
sp.. Bacillus sp.); yeast ceUs (e.g.. Saccharomyces sp.); insect cells; nematode ceUs; 
plant ceBs; amphibian cells (e.g., Xenopus); fish cells (e.g., zebiafish cells); avian 
ceUs; and mammaUan cells (e.g., human cells, mouse cells, mammalian cell lines, 
primary cultured mammalian ceUs, such as from dissected tissues). 

The molecules can be expressed in host cells isolated from an organism, host 
cells which are part of an organism, or host cells which are introduced into an 
oiganism. In one aspect, fusion molecules are expressed in host cells in vitro, e.g., in 
culture. In another aspect, fusion molecules are expressed in a transgenic organism 
(e.g., a transgenic mouse, rat, rabbit, pig, primate, etc.) fliat comprises somatic and/or 
germline cells comprising nucleic acids encoding the fiision molecules. 

Fusion molecule also can be introduced into cells in vitro, and the cells (e.g., 
such as stem cells, hematopoietic ceUs, lymphocytes, and the like) can be introduced 
into the host organism. The cells may be heterologous or autologous with respect to 
the host organism. For example, cells can be obtained from the host organism, fusion 
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molecules introduced into the cells in vitro, and then reintroduced into the host 
organism. 

Methods of Using Molecular Switches 

In one aspect, the invention provides a method for using a molecular switch to 
5 modulate a cellular activity. The cellular activity can include an eiizyme activity, the 
activity of one or more cellular pathway molecules, the transduction of a signal, and 
flie like. Modulation may direct, e.g., the switch itself may alter the activity, or 
indirect, e.g., the switch may function hy delivering a bio-efifective molecule to the 
cell which itself modulates the activity. Modulation can occur in vitro (e.g., in cell 

10 culture or in a cell extract) or in vivo (e.g., such as in a transgenic organism). 

Molecular switches comprising fusion polypeptides also can be administered to a cell 
by delivering such molecules systemically (e.g., through intravenous, intramuscular, 
or intraperitoneal injections, or through oral administration of either the polypeptides 
themselves or nucleic acids encoding the polypeptides) or locally (e.g., via injection 

1 5 into a tumor or into an open surgical field, or through a catheter or other medical 
access device, or via topical administration). 

In one aspect, molecular switches are used to conditionally modulate an 
enzymatic activity in a cell. For example, a switch molecule can be introduced into a 
cell that comprises an insertion sequence or acceptor sequence which provides the 

20 en2yinatic activity. Catalysis by flie insertion or acceptor sequence is coupled to the 
response of the respective other portion of flie fusion molecule to a signal, such as 
binding of the other portion to a molecule (e.g., such as an agent administered to the 
cell or a naturally occurring small molecule), exposure of the cell to particular 
chemical conditions (e.g., such as pH), electrical conditions (e.g., potential 

25 differences), optical conditions (e.g., exposure of flie cell to hght of specific 
wavelengths), magnetic conditions and the like. 

In another aspect a inolecular switch is provided which modulates the activity 
or expression of a molecular pathway molecule in a cell. Figure 3B shows an 
example of a switch molecule comprising a pathway molecule which is conditionally 
30 active in the presence of a signal (schematically illustrated as in the Figure). The 
switch molecule is used to alter a cell signaling pathway, e.g., altering the e^qpression 
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and/or activity of downstream pathway molecules (turning such molecules ON or 
OFF, or altering the level of expression and/or activity of such molecules). In doing 
so, the switch molecule can he used to regulate fate of one or more cells. Similarly, 
the molecular switches according to the invention can be used to control metabolic 
pathways, e.g., providing a fusion molecule which provides an enzymatic activity 
coupled to the binding of a small molecule, or response to some other signal (see, as 
shown in Figure 3E). Preferably, modulation of flie enzyme activity in response to the 
signal, in turn, modulates the expression and/or activity of molecules downstream in 
the metabolic pathway. 

More preferably, the states of the fusion molecules are coupled to a signal, 
such as the presence of an exogmous or endogenous binding molecules to which 
either the insertion sequence or acceptor sequence bmds. The ability of flie fusion 
molecule to control a pathway can be monitored by examining Hie expression and/or 
activity of pathway molecules which act downstream of a pathway molecule whose 
expression and/or activity is being modulated/controlled by flie fusion molecule. 
Preferably, control of the pathway is coupled to flie presence of the signal, e.g., 
binding of the fusion molecule to the exogenous or endogenous binding molecule, the 
presence of particular electrical or chemical properties of a cell, the presence or 
absence of particular wavelengtfa(s) of light, and the like. 

Pattiways of interest include the phosphatidylinositol-specific phospholipase 
pathway, which is normally involved with hydrolysis of phosphatidylinositol-4,5. 
bisphosphate and which results in production of the secondary messengers inositol- 
1,4,5-trisphosphate and diacylglycerol. Other pathways include, but are not limited 
to: a kinase pathway, a pathway involving a G Protein Coiq)led Receptor, a 
glucerebrosidase-mediated pathway, a cylin pathway, an anaerobic or aerobic 
metabolic pathway, a blood clotting pathway, and flie like. 

In still another aspect, a fusion molecule is provided which delivers a bio- 
effective molecule (e.g., a drug, therapeutic agent, diagnostic or imaging agent and 
flie like) to a ceU. In one scenario, shown in Figure 3C, the fusion molecule 
comprises an insertion or acceptor sequence which binds to the bio-effective 
molecule, while the respective other portion of tiie fusion binds to a ceDular marker 
fliat is a signature of a pathology, e.g., a small molecule, polypeptide, nucleic acid, 
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metabolite, whose expression (presence or level) is associated with Ihe pathology. 
Preferably, the fusion molecule releases the bio-effective molecule only in the 
presence of the marker of the pathology. 

Figure 3D shows an alternative method of transporting a bio-effective 
5 molecule. In this aspect, the insertion sequence or acceptor sequence comprises a 
transport sequence for transporting a bio-effective molecule bound to tiie fusion 
molecule intracellularly. Preferably, the insertion sequence and acceptor sequence ar 
functionally coupled such that a conformational change in the transport sequence is 
coupled to intracellular release of the bio-effective agent Successful delivery can be 

10 monitored by measuring the effect of the bio-effectiye ageait (e.g., its ability to 

mediate a drug action or therapeutic effect or to image a cell). More preferably, the 
conformation change occurs upon response of flie respective other portion of the 
fusion to a signal (indicated schematically in the Figure as □ ), enabling conditional 
intracellular transport of the bio-effective molecule. When the bio-effective agent is 

1 5 delivered to one or more cells in an organism, the effect of the agent on the 

physiological responses of the organism can be monitored, e.g., by observing clinical 
or flierapeutic endpoints as is routine in the art Where the bio-e£fective molecule is 
an imaging molecule, Hhe localization of the bio-effective molecule in the organism 
can be monitored by MRI, X-ray, angioplasty, and liie like. 

2® I" one preferred aspect, the transport sequence comprises the human serum 

tranferrin (HST) polypeptide (see. Figure 4). HST mediates the transport and uptake 
of iron into cells. Iron-saturated HST binds to the transferrin receptors on cell 
surfaces and is intemalized by endocytosis. In endosomes, the pH becomes mildly 
acidic causing the release of iron and a concomitant conformational change in HST. 

25 The transferrin-receptor recycles to the surface where HST is released and is free to 
bind more iron. As tumor cells e;q>ress high levels of transferrin receptors, several 
strategies for tiie targeted delivery of toxic proteins and chemotherapeutic drugs using 
transferrin uptake pathway have been pursued (Barbas, et al., 1992, J. Biol. Chenu 
267:. 9437-9442; Trowbridge and Domingo, 1981, Nature 294: 171-173). A clinical 

30 trial has demonstrated that an HST/diphtheria toxin conjugate was effective for the 

treatment of recurrent malignant brain tumors in humans, (see, e.g., Laske, et al., 1997, 
Nat. Med. 3: 1362-1368). HST has been demonstrated to tolerate insertions of 
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peptides while retaining biological activity (see, e.g., Ali et al., 1999, J. Biol Chem, 
274 : 24066-24073). 

Therefore, in one aspect, the insertion sequence or acceptor sequence 
comprises an HST polypeptide or active portion thereof, while the respective other 
5 portion binds to a bio-effective molecule. The binding sequenced-HST sequence 
functions like a 'Trojan horse** for transporting the bio-effective molecule into cells. 
A suitable binding sequenced can comprise a dihydrofolate reductase (DHFR) which 
binds to the anti-cancer drug, methotrexate. 

As shown in Figure 4, outside the cell, the transferrin domain of the 'Trojan 
10 horse* fusion molecule binds iron and Ae binding domain binds the drug. The fusion 
interacts wifli the transferrin receptor and is ehdocytosed. A decrease in pH in the 
endosome causes a conformational change in the transferrin domain, resulting in a 
conformational change in the drug binding domains which occmrs concomitant with 
drug release. The fusion is recycled back outside of the cell to repeat the cycle again. 
1 5 Because HST has a long circulating half-Ufe and can continuously cycle in and out of 
the a cell, multiple drug deliveries are possible using this scheme. Delivery of 
methotrexate can be optimized by selecting for fusion molecules which bind to 
methotrexate at lower affinities than natural DHFR, e.g., by in silica modeling or 
from mutagenesis studies (see, e.g., Miller and Benkovic, 1998, Chem. Biol 5: R105- 
20 R113). 

In still another aspect, the invention provides a method for killing undesired 
cells, such as abnormally proliferating cells (e.g., cancer cells) (see, e.g.. Figure 3E). 
For example, a fusion protein conqirising a conditionally toxic niolecule which targets 
to a cell having a pathology can be administered a cell (or an organism comprising the 

25 cell). Preferably, the toxic state of the fusion protein is coupled to the response of the 
fusion protein to a signal, such as exposure to a marker of a pathology, causing the 
fusion protein to switch from a non-toxic state to a toxic state when it encounters the 
cell comprising tiie pathology. In one aspect, the change in state from a toxic to a 
non-toxic or less toxic molecule is coupled to binding of the fusion protein to the 

30 maricer of the pathology. 
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In a further aspect, a fusion molecule is provided for regulating an activity of a 
nucleic acid regulatory sequence in vitro or in vivo. Activities which can be regulated 
include transcription, translation, replication, recombination, supercoiling, and the 
like. Preferably, fusion molecules are selected in which binding of the insertion 
sequence or acceptor sequence of ttie fusion molecule to flie nucleic acid regulatory 
sequence is coupled to the response of the respective oflier sequence of the fusion 
molecule to a signal. Such fusion molecules can be used to create cells with 
conditional knockouts or knock-ins of a gene product whose expression is mediated 
by the activity of the nucleic acid regulatory sequence to which the fusion molecule 
binds, e.g., by providing or withdrawing the signal as appropriate. In one aspect, the 
signal is a dmg or therapeutic agent. In another aspect, the signal is a change in pH, a 
change in cellular potential, or a change in exposure of a cell (and/or organism) to 
light. For example, a probe for delivering particular wavelengths of light can be used 
to provide a highly localized signal to a cell expressing a fusion molecule in vivo. 

In still a further aspect, the fusion molecules according to the invention 
comprise sensor molecules that can be used to detect target analytes in vitro ox in vivo 
(see. Figure 3G). Target analytes include, but are not limited to: small molecules, 
metabolites, lipids, glycoproteins, carbohydrates, amino acids, peptides, polypeptides, 
proteins, antigens, nucleotides, nucleic acids, cells, cell organelles, and small 
organisms (e.g., microorganisms such as bacteria, yeastj protests, and flie like). 

The fusion molecule can be exposed to a target molecule in solution or stably 
associated with a solid support that can be exposed to a sample suspected of 
containing the target molecule. Alternatively, the fusion molecule can be expressed in 
a cell, i.e., for detecting intercellular or extracellular targets (for example, where the 
fusion molecule comprises an extraceUular binding domain). Analyte present in the 
sample will bind to the fusion molecule, triggering production of a signal by the 
signaling portion of the molecule. Suitable signaling molecules fix>m which this 
portion can be obtained include molecules capable of emitting light, e.g., such as 
GFP, or modified, or mutant forms thereof (e.g., EGFP, YFP, CFP, EYFP, ECFP, 
BFP, and flie like). Other signaling molecules include electron transferring domains 
(e.g., such that the electrical characteristics of the fusion molecule can be monitored 
to provide a measure of target analyte), binding domains (e.g., dornains capable of 
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binding to a labeled molecule), and catalytic domains (e.g., |3-lactamase, luciferase, 
alkaline phosphatase, and the like). 

Signaling molecules which comprise catalj^c domains can be detected by 
monitoring changes in the level of a fluorescent substrate. For example, when the 
5 catal3^c domain is obtained from P-lactamase, fluorescent substrates such as 

CCF2/FA and CCF2/AM can be used(see, e.g., Zlokamik, et al.. Science 279: 84-88 
(1998)). 

In a further aspect, the invention provides a method for modulating a cellular 
response by conditionally providing a pair of fusion polypeptides to a cell to mediate 
10 the response. For example, the pair of fusion polypeptides can comprise a binding 
activity, an enzymatic activity, a signaling activity, a metabolic activity, and the like. 
In one aspect, the pair of fusion polypeptides modulate transcription, translation, or 
replicatioh of the cell and/or alters a cellular phenotype in response to a signal 

Preferably, each member of the pair comprises a portion of a polypeptide 
1 5 fused to an oligomerization domain. Neither portion by itself can function; however 
when the portions are brought in proximity to each other, the activity of the 
polypeptide is restored. In one aspect, oligomerization of the oligomerization domain 
brings the portions of the polypeptide in proximity to each other and restores the 
function of the polypeptide. Preferably, oligomerization occurs in response to a signal 
20 (e.g., such as the presence of a molecule to which the oligomerization molecules must 
bind in order to oligomerize). 

Examples 

The invention will now be further illustrated with reference to the following 
examples. It will be appreciated that what follows is by way of example only and that 
25 modifications to detail may be made while still falling within the scope of the 
invention.. 

Example 1 . Generating Fusion Molecules by Domain Insertion 

A model system consisting of E. coli maltose binding protein ("MBP") as the 
acceptor polypeptide sequence and the penicillin-hydrolyzing enzyme TEMl p- 
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lactamase as the insertion polypeptide sequence was chosen to test the combinatorial 
domain insertion strategy for coupling flie two proteins' function. The desired 
property of tfie model switch is the ability to modulate p-lactamase activity tiirough 
changes in maltose concentration (i.e., the switch molecule or fusion protein would 
5 behave as an allosteric enzyme). 

Construction And Testing Of Target Plasmid 

The K coli MBP was cloned into plasmid pDlMCS (Ostermeier and 
Benkovic, 1999, Nat Biotechnol 17: 1205-1209) under control of the IPTG inducible 
tac promoter to create plasmid pDIMC8-Mal. The MIC for ampicillin of 
1 0 DH5a>^DIMC8-Mal cm LB plates was found to be 30-35 fig/ml. 



Construction Of ^-Lactamase Insert DNA 

The p iactamase gene fragment bla [24-286] (encoding for amino acids 24 - 
286 of the P-lactamase gene) was amplified by PCR from pBR322 such that it was 

1 5 flanked by Earl restriction enzymes sites. Attenq>ts to clone fliis constmct into flie 
BamHI site of pACYC184 resulted m very few transformants which, upon 
characterization, were found to contam plasmids that lacked the p-lactamase gene 
fragment Thus, &e first DNasel library (described below) was constructed by 
digesting the bla[24-286] PCR product with Earl. Subsequently, it was found that the 

20 bla[24-286] fragment could be cloned into the pTAdv to create the stable vector 
pTAdv-piac. Subsequent libraries used a bla[24-286] insert isolated from this 
plasmid. It is preferable to use a bla[24-286] fragment derived from a plasmid digest 
since, unlike the PCR product, the insert DNA will be known not to contain any 
mutations. However, it may be useful in tiie fiiture to create libraries in which flie 

25 bla[24-286] insert has been mutated by error-prone PCR (see, Caldwell, 1 995, supray 
Note fliat the bla[24-286] fragment for insertion, in this example, does not contain a 
sequence coding for a flexible linker. However, flexible Hnkers can be useful for 
construction of molecular switches. 

30 
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Constntction of Random Insertion Libraries 

Plasmid pDIMC8-Mal was randomly linearized using three different methods: 
(1) DNase/Mn^* digestion followed by polymerase/ligase repair; (2) SI nuclease 
digestion followed by polymerase/ligase repair; and (3) SI nuclease digestion (not 
repaired). The three protocols differ in (a) the methods used to create the random 
double-stranded break in the target plasmid and (b) whether or not the DNA was 
repaired by polymerase/ligase treatment Digestion was controlled such fliat a 
significant fraction of DNA was undigested in order maximize the amount of linear 
DNA that only had one double strand break (see. Table 2). Key features for 
optimizing the DNase I digestion were the use of Mg^* free DNasel (Roche Molecular 
Biochemicals), a digestion temperature of 22 °C and 1 mM Mn^* instead of Mg^* to 
increase the ratio of double strand breaks to nicks (see, e.g., Campbell and Jackson, 
\9Z0,siq>ra). 

The DNA was repaired using T4 DNA ligase and T4 DNA polymerase (Graf 
and Schachman, 1996, Proc. Natl. Acad. Sci. USA 93: 1 1591-1 1596) (except for 
method (3)) and dephosphorylated. Ligation with the bla[24-286J insert DNA and 
transformation into DH5a produced 10^-10*^ transformants with a smaU to large 
fraction (depending on the method) of the transformants containing the bla[24-286] 
insert (Table 2). 

Preparing Tlie Inserted Gene For Insertion 

As an example, the preparation of aie DNA of the inserted gene will be 
described for 6-lactamase. All the random iiisertion methods require that the inserted 
DNA {bla) be prepared as a linear piece of dsDNA with blunt ends containing only 
the DNA sequence desired to be inserted. The? desired DNA is the DNA that codes 
for amino acids 24 to 286 of TEM-1 6-lactam^e in pBR322 ibla[24-286]). Amino 
acids 1-23 are not desired because they are the signal sequence that targets 6- 
lactamase to the periplasm. This sequence gets cleaved upon entering the periplasm 
and is not part of the mature, active B-lactamase. In the fusion constructs, the natural 
signal sequence of malE will direct the fusions to the periplasm. The bla[24-286] 
DNA will be prepared as in Figure 2A by amplifying the DNA such that the sequence 
is between Earl restriction sites. This DNA is cloned into the Bamm site of 
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pACYC184 to create pACYC-BLA. As shown in Figure 2A, this construct can be 
digested with Earl and the Wa[24-286] DNA treated wifti Klenow DNA polymerase 
to achieve the desired fragment for insertion. This is achieved by virtue of the fact 
fliat Earl is a type nS restriction enzjntne that binds a non-palindromic sequence and 
cleaves outside this sequence. 

To achieve the correct geometric configuration and flexibility in the fusions, it 
may be necessary to include flexible Knkers in the fusions at the insertion site. For 
example, suitable linkers, include, but are not Ihnited to: GlyGlyGIySer on the N- 
terminus and SerGlyGlyGly on the C-terminus. Linkers can be added by amplifying 
the Z>/a[24-286] DNA such that the following DNA sequence 5'-GGTGGTGGCAGC- 
3' is added to the 5' end and the sequence 5'-AGCGGTGGCGGC-3'is added to the 3 ' 
end. 

Construction And Characterization Of Insertion Libraries 

Two general methods are employed: (1) insertion into a plasmid with a 
random double-sttanded break prepared by nuclease digestion and (2) insertion into a 
gene usiag CP-ITCHY. 

For the former, tihree related strategies differing in the nature and order of use 
of the nucleases will be used to construct create a single, double strand break in a 
plasmid containing the MBP: (1) limited DNasel digestion in the presence of Mn^^, 

(2) limited DNasel digestion in the presence of Mg^"^ to produce a single nick 
followed by SI nuclease or mung bean nuclease digestion to cleave opposite the nick 

(3) limited digestion with SI nuclease (SI nuclease can convert supercoiled circular 
DNA to Unear DNA by first making a nick on one of the two strands and then cutting 
across from this nick (Germond, et aL, 1974, EurJBiochent 43: 591-600), 
particularly under conditions of low ionic strength (Gonikberg, 1979, MoL Biol 
(Mosk) 13: 1064-9). 

Although flie first two methods have been used for linker scanning 
mutagenesis (the random insertion of short sequences), there is Uttle published data on 
the nature of the sequences at fte insertions site of the naive hT>raries, and this data is 
sometimes conflicting. Preferably, for all libraries generated, random members of the 
naive libraries are selected and the DNA at the insertion sites sequenced to quantify 
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Ihe distribution and sizes of: deleted DNA, direct insertions and tandemly duplicated 
DNA at the insertion site. In particular, insertions in which sequences of the insertion 
sequence are tandemly duplicated may be useful for the same reasons that protein 
fragments that exhibit protein fragment complementation often have overlapping 
5 sequences. Such overlapping sequences are thought to transiently protect exposed 
regions during folding. Duplications or deletions also are likely to be important for 
creating molecular switches by affecting Ae distance and interactions between 
insration and acceptor sequences. . 

Incremental truncation methods also can be used for generating libraries of 
10 molecules to provide fusion molecules which have larger deletions and tandem 
duplications at the insertion site. The size of tiiese tandein duplications (or even 
deletions) can be controlled by size selection of the library. 

Selection Of Active Fusiom: fi-Lactamase-MBP Fusions 

Once B-lactamase-MBP insertion libraries have been constructed, they are 
1 5 subjected to selection to identify fliose library members that have both fi-lactamase 
and MBP activity as well as those in which B-lactamase activity depends on the 
presence or absence of maltose. . The selection scheme is outiined in Figure 2D. 
Fusions with a functional B-lactamase domain can be identified by growth of bacteria 
expressing the fusions on plates containing Amp. Fusions whose fi-lactamase activity 
20 requires maltose can be identified by plating bacteria on Amp/maltose plates and then 

replica-plating onto Amp plates to identify clones which grow on the former and do 
not grow on the latter. Fusions whose B-lactamase activity requires the absence of 
maltose can be identified by plating bacteria on Amp plates and then replica-plating 
onto Amp/maltose plates to screen for clones which fail to grow on the former and do 
25 grow on the latter. 

An alternative screen also is possible. The first screen is c^ed out as before. 

On the second screen, the plates will not contain any ampicillin, but still will or will 

not contain maltose (e.g., flie screen is the opposite of the first screen). Filter paper 

soaked in a nitrocefin solution is overlaid on the colonies for a short period of time. 

30 Since nitrocefin is a yellow-colored compound, initiafly the filter paper wiU be 

uniformly yellow (absorbance peak at 390 nm). However, those hT>rary members 

with B-lactamase activity wiU degrade the nitrocefin to hydrolyzed nitrocefin which is 
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a red compound (absorbance peak at 485 mn) (O'Callaghan, et aL, 1972, Antimicrob, 
Ag. Chemother, h 283-288). Colonies that fail to turn the filter paper red are 
identified as those that lack 8-lactamase activity under the chosen conditions. 

Yet another screen is also possible which relies on the use of Fluorescence 
5 Energy Transfer (see, e.g., Zlokamik, et al., 1998, Science 279\ 84-88 ). For 
example, the substrate CCF2/AM is not charged and can cross the membrane of 
mammalian cells to enter the cytoplasm where non-specific esterase remove the ester 
functionalities of tiie substrate to create CCF2. In CCF2, the cephalosporin core links 
a 7-hydroxycoumarin to fluorescein. In the intact molecule, excitation of the coumarin 

10 results in FRET to flie fluorescein, which emits green light Cleavage of CCF2 by P- 
lactamase results in spatial separation of flie two dyes, disrupting FRET such that 
excitation of the coumarin now gives rise to blue fluorescence. Charges on CCF2 and 
its bela-lactmase cleavage products prevent it firom leaving the cytoplasm. Thus, 
FACS and cell sorting can be performed, with and without maltose, to identify fusions 

15 in which beta-lactamase activity is dependent on maltose by monitoring FRET. 

Generally, any substrate comprising a suitable FRET donor and acceptor pair can be 
used to monitor the enzymatic activity of fusion molecules according to the 
invention.The above three mefliods will identify ON/OFF switches (i.e., switches in 
which maltose has a very large effect on fi-lactamase activity). In the event that such 

20 ON/OFF switches are sufScientiy rare or do not occur, and/or to identify switches in 
which maltose has a more modest effect, a FRET-based method (e.g., such as based 
on CCF2) or a spectrophotometric assay can be performed to screen for threshold 
levels or ranges of B-lactamase activity (see, e.g., Baneyx and Georgiou, 1 989, 
Efizyme Microbe Technol 11: 559-567; Sigal, et at., 1984, J. BioL Chain, 259: 5327- 

25 32). Such an assay can be modified for high duroughput screening of the activity. 

In one aspect, cultures are grown of library members that exhibit fi-lactamase 
activity in flie malK stram PM9F' (BeUon and Ho&ung, 1994, EMBOJ. 13: 1226- 
1 234). When grown on minimal plates with maltose as the sole carbon source, cells 
expressing desired fusions have both B-lactamase activity and the ability to bind 
30 maltose. Such cells can be expanded in multi-well plates (e.g., such as microtiter 
plates), lysed using lysozyme/detergent (e.g., Sambrpok, et al., 19S9, In Molecular 
Clonmg: A Laboratory Manual^ Cold Spring Harbor Laboratory, Cold Spring Harbor, 
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N.Y.), and treated with DNase and RNase. The insoluble fraction is removed by 
centrifugation and the cleared lysates are assayed in the presence and absence of 
maltose for B-lactamase activity by the measurxag a decrease in penicillin G 
spectrophotometrically at Azn- Since the goal is to find differences in activity wifli 
5 and without maltose, variations between library members in total fusion protein 
production, growth of the cells and degree of lysis is not a significant concem. 

Evaluation OfTlie Insertion Libraries 

Sequencing was performed on mndom members of tfie insertion libraries 
constructed using DNasel or SI nuclease (see table below). All sequences were 
10 unique and were distributed throughout the plasmid (supporting the randomness of the 
methods). Both methods created libraries with tandem duplications, direct insertions 
and deletions. The data strongly suggest that distribution of tandem duplications and 
deletions in libraries created by the SI nuclease method were in a much narrower 
range. 



Table 1. Location, Orientation And Nature Of Sequences At Insertion Site For DNAse 
And SI Nuclease Created Random Domain Insertion Libraries 



Method 


%in 


%in 


Deletions (-) 




MalE 
gene 


"forward" 
direction 


Direct insertions (0) 
. Tandem Duplications (+) 


DNasel-repaired 


75% 


40% 


+18,+7,+l,+l 


library 2 


(15/20) 


(8/20) 


0 

-5, -13, -16, -17, -42, -48, -54, -56, -75, -162, -191, 
-263, -340, -379 


SI Nuclease 


45% 


27% • 


+5, +4 


repaired 


(5/11) 


(3/11) 


0 

-1, -1, -2, -2, -5, -6. -22, -101 



Roughly 1 % of the transformants that had a plasmid with a bla[24-286] insert, 

regardless of the method of lihrary construction, could grow on 50 jig/ml AMP. 

Randomly selected Amp^hhrary members were sequenced. All sequences were 

20 unique (supporting the 'randomness' of insertion) and Table 2 describes whether they 

contained deletions, tandem duplications, or neither (direct insertion) and whether 
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both fusion points were in-firame or not. Predominantly flie Amp^ colonies had an N- 
tenninal fragment of the MBP gene fused in frame to bla[24-286] with the remaining 
fragment of the MBP gene being out of frame. The distributions in Amp^ library 
members suggest that deletions predominate in the DNase I protocol and tfiat not 
repairing plasmid linearized with SI nuclease can bias the library toward direct 
insertions (fliough the fraction of library members without an insert increases 
significantly). In DNasel library #2, 63% (10/16) of library members in the naive 
library comprising the ^-lactamase gene had it inserted in tiie MBP gene. This 
frequency is higher than that expected based solely on the fraction of DNA in the 
plasmid that codes for the MBP gene since insertions at many locations other than the 
MBP gene (e.g., Cm^ gene, origin of replication) do not make viable, Cm^ plasmids. 



Table 2. Comparison of Domain Insertion Libraries 



Method Distribution Transformants Frequency 
Of Of 
, pDIMCS- Transformants 
Mai With Insert^ 

After 
Digestion 



Frequency Deletions (-) Fraction In 

of Direct Insertions (0) Frame At 

Amp*^ Tandem Both 

Colonies Daplications.(+) In Crossovers 

Randomly Selected 

Amp*^ Colonies 



Dnasel 
repaired 

Library 
#1 



51% 

supercoiled 
23% nicked 
26% linear 



-5x10* 



-0.18 0.0017 -95,-58,-20,-10,- 

5,-3,-1 
0 

+1,+51 



.0/10 



Dnasel 
repaired 

Library 
#2 



27% 

supercoiled 
44% nicked 
28% linear 



-10x10* -0.70 00079 -15,-11,-10-8,-5, 

0 
+1 



2/6 



SI 

nuclease 
repaured 



24% 

supercoiled 
42% nicked 
34% linear 



1.8x10* 



-0.25 



0.0023 



0/1 



SI 

nuclease 
(not 

repaired 



24% 

supercoiled 
42% nicked 
34% liiiear 



1.0x10* 



-0.06 



0.0005 



-2 

0,0,0 



3/4 



15 



It is desirable to eliminate members of the library which have B-lactamase 
activity and consist of an N-temiinal fragment of malE fused to an inserted 
lactamase gene with the C~terminal fragment of malE being out of frame with the 
inserted gene to eliminate members of the library incapable of coupling maltose 
binding to P-lactamase activity. 
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This can be accomplished in a secondary screen by introducing flie library into 
the auxotrophic strain PM9F' which contains a deletion of the MBP gene, growing the 
bacteria under conditions such that maltose is the sole carbon source and selecting for 
MBP activity as well as for p-lactamase activity (see. Figure 2D). Without a 
functional MBP protein, PM9F' will not grow. In this way, fosions that have a 
fonctional insert and can bind maltose will be identified. Table 3 shows three fusions 
with botfi beta-lactamase activity and the ability to transport m&ltose in E. coli 
identified by this method. As can be seen, the selected fusions consist of both 
tandem duplications and deletions of the maltose binding protein at the insertion site. 
One caveat to fliis secondary screen, however, is that library members that can bind 
maltose but alter the abiKty of MBP to interact correctly with other proteins involved 
in maltose transport (e.g., MalF and MalG) will not be selected. 

Table 3 summarizes locations of insertions in fusion molecules which 
comprise both P-lactamase and MBP activities. 



Table 3. Locations Of Insertions Found By Random Insertion With Both P-Lactamase And MBP 

Activities 

Sequence Of 
Bifnnctionai Fusions 



Net Residues Deleted 
Or Tandemly Duplicated 
(+) 



Structure Inserted Region Previously 
Into Found To Tolerate 

Short Insertions?* 



MBP[1-163]-BLA- 
MBP[ 174-397] 



-12 



Beta sheet 



yes 



MBP[1.175]-BLA- 
MBP[1 79-397] 



Beta sheet 



yes 



MBP[l-246]-BLA. 
MBP[238-397] 



+8 



Beta sheet 



No 



*DupIay, et al., 1987,. J Mol Bio! 663-73. 



20 



An analysis of eighteen randomly selected naJve library members of a DNAse- 
repaired library, generated as described above, was performed to determine the exact 
site and orientation of insertions in the hT>rary. Thirteen (72%) of the eighteen 
members of the library included insertion sequences (BLA sequences) inserted at 
random in the MBP acceptor sequences. The majority of library members (14/1 8) had 
deletions of acceptor sequences at the insertion site, though a direct insertion and 
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three tandem duplications were also found. Fifty percent of the library (9/1 8) had 
deletions and duplications of less than or equal to eighteen bases. Although large 
deletions are almost certain to be deleterious for function, small deletions and tandem 
duplications are an important somrce of diversity in the library. 

From a library of 1.06 x 10^ transformants of the DNAsel library, 0.8% 
(approximately 8,000 members) could grow on 50 ^ig/ml LB/AMP plates indicating a 
functional p-lactamase protein. Sequencing of plasmid DNA from random AmpR 
colonies showed that libraiy members with an N-terminal fragment of the MBP gene 
fused in frame to bla[24-286] wifli the remaining fragment of the MBP gene being out 
of frame predominated fliis sublibrary. The plasmid DNA from all Amp resistant 
colonies was isolated en mass and transformed into the MBP auxotroph PM9F', a 
strain unable to grow on minimal media with maltose as a sole carbon source unless 
flie MBP is provided in trans (Betton and Hofiiung, 1994, EMBOJ. 13rS\ i 1226-1234). 
In the malE auxotroph approximately 10% (i.e., about 800 members) of the sublibrary 
could grouw on a 50 ^ig/ml AMP minimal plate containing 0.2% maltose, indicating 
that MBP could transport maltose in E. coli. Analysis of these bifunctional library 
members indicated that the insertions were predominantly localized to three locations 
in the MBP protein: near the C-terminus, near residue 1 70 and near residue 210. 
Randomly and non>randomly selected library members were sequenced (see. Table 4 
below). The sites for successful insertion correlate well with results on linker 
scanning mutagenesis (random insertion of short DNA sequences) in MBP (see, e.g., 
Betton, et al.. 1993. FEBSLett. 325 ri-2): 34-8.V 



25 



Table 4. Locations Of Insertions of p-Lactamase into MBP Where Fusions Are Bifunctional* 


Sequence of randomly selected 
bifunctional BLA-MBP fusions 


Sequence of other bifunctional 
BLA-MBP fusions [not 
randomly selected) 


Sequence of functional MBP 
variants found by linker 
scanning mutagenesis** 
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T164-166; A164-170; T166-175 
(2);T167;T167-170(3); 
A168-184; T172; T179-184 


A163-175; T163 (2); T164; 
E166/167- T166-167- A170- 
171;A175-179 


A162.177(3) 




T213.220 


A207-216 (3); A2 12-220 (2); 
E285/286 




E306/307 


A297-312{3);A301(2); A301- 
306 (3); A304-309 (3); A304-312 
(3) 


T318(3) 






A367-368; T369; 0362; 0367; 
O370 


A367-368; T369-370 




*A means deletion of the indicated MBP residues at the insertion point of BLA. "T** means a tandem 
duplication of the indicated MBP sequences at the insertion point The duplicated residues are on either 
side of the BLA sequence. "F* means that insertion of BLA was exactly between the indicated residues 
of MBP. "O'' ("out of frame'O is the number of the residue of MBP that fte N-terminus of BLA is fused 
to; the remaining sequence is the out-of-frame sequence that the C-terminus of BLA is fused to. For the 
BLA-MBP fusion proteins, the number in parenthesis is the number of times the sequence was found. 
-For the linker scanning mutagenesis, the number in parenthesis is the number inserted into MBP. 


♦Betton, et al., 1993, FEES Lett. 325 n^2): 34-8. 



Identification of Switches 

In an initial examination of the behavior of these biiunctional proteins, 
overnight inoculums of PM9F9 cells bearing nine of the sequenced members of flie 
5 library were lysed by French press and the soluble firactionassayed by ntrocefin 

hydrolysis (O'Callaghan, et al., 1972, Antimicrob, Ag, ChemothenJ: 283-288) with 
and without 50 mM maltose. One member, T369-370 (i.e., comprising a P-lactamase 
inserted such that amino acids 369 and 370 of MBP were tandemly duplicated on 
either side), exhibited ah increase in velocity in the presence of maltose but not 

10 sucrose. Amino acid 370 is the last amino acid of MBP; thus, T369-370 is essentially 
an end-to-end fusion. Removal of amino acid residues 369 and 370 from the C- 
termmus to produce an exact end-to-end fusion C*MBP-BLA") resulted in a fusion 
that exhibited a stimulation of nitrpcefin hydrolysis in flie presence of maltose of the 
same magnitude as T369-370. It was unexpected that such an end-to-end fusion 

1 5 would result in a switch since end-to-end fusions of MB? and BLA wifli linkers have 
not been reported to behave as switches (see, e.g., Betton, et al., 1997, Nat. 
Biotechnology 15: 1276-1279). In addition, tiie P-lactamase activity of one of the 
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other nine bifunctional proteins tested that has a similar sequence (0367-368) was not 
modulated by maltose. 

To identify other switches, a semi-rapid fliroughput assay was developed in 
which cultures of random bifunctional library members were grown in 96-well format 
5 in the presence of EPTG, resulting in the accumulation of the bifunctional protein in 
the media. The cultures were centrifuged to pellet the cell and the media was assayed 
spectrophotometrically for the velocity of p-lactamase hydrolysis of nitrocefin in the 
presence and absence of 5 mM maltose in a 96-well format The concentration of 
nictrocefin used was the same as the for nitrocefin of wild-^e P-lactamase so 

1 0 that switches in which maltose binding affected either kcat or Km could be identified. 
Any culture in which there was a difference in rate of more than 20% (between with 
and without maltose, to eliminate differences due to vaiiablility in protein production) 
was selected for fiirttier investigation. In a screening of 303 library members, a 
second library member that showed an increase in velocity of nitrocefin hydrolysis in 

1 5 the presence of maltose, but not in the presence of sucrose or glucose, was found three 
times — T164-165 (i.e., ^-lactamase was inserted such that amino acids 164 and 165 
of MBP were tandemly duplicated on either side). 

The criteria for bifunctionahty in the above screens was quite stringent: the 
fiisions were required to have beta-lactamase activity and to be able to transport 

20 maltose in E, coli. Transport requires maltose binding, a conformational change in 
MBP upon maltose binding, and fte requisite interactions with membrane proteins MalG 
and MalE. Thus, library members fliat bind maltose but cannot interact with MalG and 
MalF are hot selected (are not bifunctional by definition): The sites for successful 
insertion of P-lactamase into MBP to make a bifunctional protein correlate quite well 

25 with permissive sites in MBP that tolerate short insertions/deletions (Betton, et al., 1 993, 
FEBS Lett, 325(1-2): 34-8) and protein bisection (Betton, et al., 1994, EMBOJ. 13(5) : 
XTIS-X 234,). Thus, the striking observations of those studies-lhat permissive sites were 
often within a hehcal and p strand structural elements- is repeated here. Bifimctiohal 
fusion □163-175 deletes an entire P-sheet and bifunctional fusion T21 3-220 tandemly 

30 duplicates two-thirds of an a-helix. Permissive sites for random insertions of GFP into the 
cAMP-dependent protein kinase regulatory subunit have also included ones within a 
helices (Biondi, et al., 1998, Nucleic Acids Res. 26(21): 4946-4952). 
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Two of liie five pennissive sites for linker scanning mutagenesis and protein 
fragment complementation (-133 and -285) were not observed to be pennissive for 
domain insertion in this study. However, in a previous study, 3-Iactamase, wifli 4-5 
amino acid linkers on each end, was successfiiDy inserted into MBP at 133 (Betton, et al, 
1997, Nat. Biotechnolorv 15 : 1276-1279), suggesting fliat linkers may be required at flris 
site. The reason fliat insertions at 285 were not found could be that insertions at fliese 
locations (a) do not result in folded protfeins (b) are not conducive to bp-ladamase activity 
or maltose binding or (c) pievrait the conect association of MBP with membrane proteins 
MalG and MalF-an association required for maltose transport However, with regard to 
the latter possibility, the sites of interaction between MBP and MalG and MalF (amino 
acids 13, 14 and 210 whidi were identified by gpnetic analysis (Hor and Shuman, 1993, ^ 
Mol Biol 2r^^4) • fi<;o-7n) «^ m^^i ^^^^ j 005 

Kinetic Characterization of Switches 

In one aspect, the kinetic constants and binding constants of tfie original 
wildtype genes, the two switches (T164-165 and MBP-BLA) and two bifonctional 
non-switches with similar sequences to the switches (T164 and 0367-368) were 
determined from Eadie-Hofstee plots and Eadie plot equivalents, respectively, using a 
spectrophotometric assay for nitrocefin hydrolysis (Sigal, et al., 1984, J. Biol. Chem 
259£8l: 5327-32). Hies results of this assay are summarized in Table 5, below. 



Tables. J 


Onetic And Binding Constants Of p-Lactamase-MBP Molecular Switciies'' 


Sequence 


Kd maltose 
(jiM) 


Km nitrocefin QiM) 


kotf i-maltose) 


1^22^ (-maltose) 






5 mM Maltose 


No maltose 






B-lactamase + 
MBP** 




47 ±6 


44±3 


1.0 ±0.1 


1.0 ±0.2 


T164.165 


3.2=fc 1.0 


45 ±4 


61 ±8 


1.4±0.1 


1.9 ±0.3 


T369-370 


-10 


-42 


-34 


-1.7 




MBP-BLA 


14±7 


46 ±3 


30±3 


1.8 ±0.1 


1.2 ±0.2 


•Conditions: 22-C. 0. 1 M phosphate (pH 7.0) 1 mM EDTA (+5mM maltose where i 
lactamase and MBP present as separate proteins; ^Schwartz et al(Schwartz, Kellerm 


ndicated); ^^P- 
ann et al. 1976) 



20 



were linear indicating that the Michaelis-Menten equation holds for the switches. The 
dissociation content of the switches for maltose was detemiined using change in 
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velocity of nitrocefin hydrolysis as a signal. The absolute values of kcat are not known 
since the total protein concentration is not known. The relative kcat*s (and also the 
relative specificity constants) fliat compare with and without maltose can be 
determined because the enzyme concentration, though unknovm, is the same for both 
5 measurements of Vmax- The measurements of Km for nitrocefin observed herein 
closely match that of a previous study (54.7 pM) (see, Raquet, et al., 1994, J. Mol 
Biol 244r5) :625-39\ 

The end-to-end fusion shows a larger increase in kcat than Tl 64-1 64 did (80% 
vs. 40%) but this is compensated for by an increase in ICm for the end-to-end fusion. 

10 Tl 64-1 65 shows both an increase in kcat and a decrease in in the presence of 
maltose and also shov^ an increase of kca^^ (90%) in tiie presence of maltose. 
Tl 64-165 was also the most sensitive switch, wifli a Kp for maltose close to that of 
the wildtype MBP- All of the above kinetic characterizatibn was perfomied on the 
media firaction; however, Tl 64-1 65, in which a His -tag has been added, was b 

1 5 purified by nickel affinity chromatography to high purity and has been shown to 
exhibit switching behavior comparable to what was seen in the media fraction. 

Switching Bdiavior Correlates With A Conformational Change in MBP 

Although MBP can bind many other linear maltodextrins, cyclodextrins and 
reduced or oxidized variants thereof, only those ligands which induce a 

20 conformational change in MBP (HaD, et al. (1997) J, Biol Chem. 2 72f28): 17605-17609; 
Hall, et al. (1997) J.Biol Chem. 272(28) : 17610^17614) behaved as a switch (see. Figure 8). 
Binding of P-cyclodextrin (which does not produce a conformational change) was confirmed 
by conq)etition experiments in which maltose's effected on p-lactamase could be competed 
away with these sugards. This suggests conformational change in MBP upon ligand binding 

25 as a medianism for the coupling achieved between maltose binding and nitrocefin hydrolysis. 

The switches ^Tparsndy function as monomeric enzymes that derive from the 
covalent linkage oif non-int^cting, monpmeric proteins witti the prerequisite binding . 
and catalytic fimctionalities, respectively. 



30 
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Example 2. MBPrGFP Fusions 

Maltose Binding Protein (MBP) and GEP fusion molecules are generated 
essentially as described above. 

Selection Of Active Fusions: GFP-MBP 

5 E. coli cells expressing GFP can be sorted based on fluorescence and other 

parameters using flow cytometry (Daugherty, et al., 2000, Proc. Natl Acad. Set USA 
97: 2029-34). Initially, £1 coli cells expressing GFP-MBP fusions library are 
screened to identify cells with significant green fluorescence and which grown in the 
presence of maltose 0>rovided in both in the growth mediimi and during the sorting 

10 process) as well to identify cells that haye significant green fluorescence without 
maltose (absent in both the growfli medium and during the sorting process): Cells 
selected are re-cultured and cells are sorted for the absence of, or a decrease in, 
fluorescence under the opposite condition (e.g., in the absence of maltose where cells 
were previously grown in the presence of maltose, and in the presence of maltose 

1 5 where cells were previously grown in the absence of maltose). Cells selected in fliis 
second sorting process are plated on LB plates with the level of maltose from the first 
sort to confirm that a lack of fluorescence is not due to reasons other than the effect of 
maltose (e.g., such as loss of plasmid, deletion of the MBP gene, mutations, etc.). 

As in Example 1 , secondary screens can be used to eliminate library members 
20 in which the insertion sequence and the acceptor sequence are out of frame. 

Example 3, Generation of Conditional Heterodimers 

As a model system, control over the neomycin resistance protein (Neo) 
(aminoglycoside phosphotransferase APH(3 >na), by conditional heterodimerization is 
engineered. Incremental truncation libraries of fragments of Neo are used to identify 
25 bisection locations in Neo that do not abolish activity by selection on plates that contain 
kanamycin. 

Design Cf Overlapping Fragments Cf Neo 

To avoid the possibility of individual fragments of Neo being active on their own, 
the starting fragments for incremental truncation are designed such that they lack essential 
30 residues for functionality because they are already N-terminally or C-terminally 
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truncated. The seven classes of APHs have very little general sequence homology 
(Wright, 1 999, Front BioscL ±, D9-21). However, a sequence alignment of representative 
members of each class, combined with the known functions of residues in APH(3')-.ina 
(Wright and Thompson, 1999, Front BioscL 4: D9-21) suggest that C-terxninal fragment 
5 Neo[51-264] will be inactive since it lacks K50 (equivalent to K44 m APH(3>ina) and 
that N-terminal fragment Neo[l-207] will be inactive since it lacks D208 (equivalent to 
D208 in APH(3 ')-ina). This is a very conservative selection of firagments as it is likely 
that fragments longer than the ones chosen will also be inactive on their own. 

Incremental truncation libraries of the same overlapping fragments are fused to 
10 parallel and antiparallel leucine zippers and are selected on plates containing kanamycin. 
Preferably, cotransformants are plated on increasing amounts of kanamycin and plated 
under different conditions (temperature and IPTG level) to select for heterodimers of Neo 
that confer kanamycin resistance. Plasmid DNA from randomly selected Kan^ colonies 
are isolated and re-transformed separately, and together, to confirm that the Kan^ 
1 5 phenotype requires both vectors. The plasmid DNA is thien sequenced to identify the 
DNA that codes for complementing fragments. 

Neo fragments that are functional only when fiiised to leucine zippers can thus be 
identified. Fusion molecules whose assembly occur when fiised to leucine zippers (e.g., 
forming functional Neo polypeptides) can be subjected to directed evolution (Arnold, et 
20 al., 2001, Trends Biochenu Set 26: 100-6) to overcome these shortcomings. 

Fragments improved by directed evolution (e.g., pairs of fusion molecules 
which display at least 2-fold greater activity, preferably, at least 5-f6ld, and more 
preferably, at least ten-fold activity) are fiised to dimerization domains that require a 
CID, thereby coupling Neo activity to the presence or absence of the CID will create 

25 Neo activity fliat is dependent on the CID. For example, fragments of Neo can be 
fiised to GyrB and tested to see if kanamycin resistance depends on coumermycin or 
to FK506-binding protein (FKBP) tested to see if kanamycin resistance depends on 
rapamycin. Preferably, firagments whose activities are improved are sequenced to 
identify relationships between types of mutations and increases in activity. In some 

30 aspects, firagments whose activities are not improved or which are actually diminished 
also are sequenced. 
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Cotistitiction Of Control Vector 

The neo gene is amplified from plasmid pSV2-Neo by overlap 
extension PGR (to s remove an internal Nco\ site that creates problems for 
doing the C-terminal truncation) and cloned into the NdeVSpel sites of pDIM- 
5 N2 to create pDrM-Nl-NeoCZ/coI*). 

Construction And Testing Of Vectors For Incremental Truncation For Protein 
Fragment Complementation (No Leucine Zippers ) 

The DNA coding for fragments Neo[1.207] and Neo [51-264] is amplified by 
PGR from pDIM-N2-Neo(M:or) and cloned into the NdeVBamHl sites of pDIMN2 
10 and the Bgtn/Spel sites of pDIMGS. The MIG of kanamycin on DH5a on LB plates 
is determined to verify that pDIMN2.Neo[ 1-207] and pDIMG8^Neo[5 1-264], either 
separately, or together, do not increase the MIC (i.e., to confirm that these fragments 
are not active by themselves). 

Determination OfTlie Maximum Rate Of Recombination 

1 5 Recombination between pDIMN2 and pDIMCS plasmids, even in recA 

mutaiits, can reassemble an intact gene (see, e.g., Ostermeier et al., 1 999, Proc. Natl 
Acad. Set USA 96: 3562-3567). Thus, in one aspect, the maximum frequency of 
recombination is determined by co-transfomiing pDIMN2-Neo[ 1-207] and pDIMCS- 
Neo[5 1-264] and plating a large number of cells on plates containing various amounts 

20 of kanamycin to identify clones in which neomycin activity is restored (e.g., clones in 
which recombination is likely to have occurred). This provides a baseline for 
determining flie amount of background in the hT>rary (e.g., the likely number of false 
positive results obtained). 

Construction And Tes tins^ Of Incremental Truncation Libraries Without 
25 Leucine Zippers 

Individual incremental truncation libraries (-1x10^ each) were constructed by 
a protocol previously described by Osfenneier, et al., 2002, In Protein-Protein 
Interactions: A Molecular Clonino^ Manunl E. Golemis. Cold Spring Harbor, NY, 
Cold Spring Harbor Laboratory Press. PGR (with primers outside the truncation 
30 region) on random colonies confirmed the desired range of truncation. These libraries 
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were co-transforaied into DH5a to create a library of 2.5 x 10^ transformants, an 
order of magnitude larger than the number of possible combinations (= 471^) of 
truncation lengths of the two libraries. These libraries were then plated at 22^C and 
37^C on plates with or without IPTG containing 5 or 50 fig/ml kanamycin. The 
5 frequency of colonies was not a significant function of growfli temperature or IPTG 
and averaged 0.00022 CPU (5 ^g/ml Kan)/ CaOJ (no Kan) and 0.00005 
fig/ml Kan)/ CPU (no Kan). Twenty-seven colonies were analyzed and found to be 
*large-plasmid* recombinants or pDIM-N2-.Neo(JVcc>r) contamination. Thus, the Neo 
gene cannot be firagmented between DNA coding for residues 5 1 and 207 to produce 
10 to gene fi:agments capable of producing enough protein with enough activity to 

provide kanamycin resistance above background. In other words, Neo is not amenable 
to z//iassisted protein reassembly. 

Construc tion Of Incremental Truncation Libraries Of Neo Fused To 
Antiparallel Leucine Zippers 

15 The individual incremental truncation libraries were constructed such that 

fi:agments of Neo were fused on the truncation side to DNA coding for antiparallel 
leucine zdppers based on those designed by Ghosh, et aL, 2000, J. Am, Ghent, Soc. 
722:5658. Three different libraries were constructed, varying in the nature of the 
flexible linker between the leucine zipper and the truncated gene: (a) no linker, (b) 

20 GSGG linker and (c) GSGGGSGG linker. The firequency of Kan^ colonies was not a 
significant function of IPTG; however, approximately 4-10 fold more colonies grew 
at 22°C than at 37'*C suggesting folding/aggregation problems in many of the 
fiBgmehts. The firequency of recombination was found to be stimulated by the 
presence of the zipper sequences, though the level of recombination was 2-4 lower 

25 than the maximum frequency of recombination determined earlier. The frequency of 
Kan^ colonies that were not recombinants ('tme positives'; at 37'*C on plates without 
IPTG) are shown in Figure 7A as a function of kanamycin concentration. Libraries 
with fi:agments of Neo fused to parallel leucine zippers also resulted in conditional 
heterodimers with similar sequences, but at a significantly lower frequency. 

30 Randomly selected true positives were selected and the DNA of the fragments 

sequenced. The plasmid DNA from these true positives was retransformed to confirm 
that Kan^ only resulted from the presence of both plasmids. Thus, the method 
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demonstrates the successful generation of molecular switches that form an active 
aminoglycoside phosphotransferase Ila (Ned) protein (capable of hydrolyzing the 
antibiotic kanamycin) only when fused to antiparallel leucine zippers. Upwards df 
twenty distinct heterodimers whose bisection loci cluster in three regions (Figure 7B) 
5 have been readily identified through selection on kanamycin plates even though 
amenable loci pairs occur at a fiequency of less than 1 for every 2000 possible 
bisection loci. These fiagments often had significant overlap and some loci were 
proximal to the active site making it unlikely these loci could have been identified 
through rational design. 

1 0 Although conversion to a conditional heterodimer severely compromised the 

Neo resistance of cells by approximately two orders of magnitude, high level Neo . 
resistance (in one case, up to wildtype levels of -500 ^ig/ml) has been restored by one 
round of random mutagenesis (using error-prone PGR under conditions such that 
approximately one mutation per fragment results) and selection on 1 0^ variants of two 

15 different conditional heterodimers (Neo[l-59]zip/zipNeo[59-264] and NeoEl- 

91]zip/zipNeo[78~264]). For the case of Neo[l-59]zip/zipNeo[59-264] the foUowing 
sets of mutation were found in a random sampling of the improved variants that could 
grow at -500 }ig/m\: C31R/K175EA^198E, C31R/M120L,N58S/R177SA^198E, 
C3 1 RyD52Q/Dl 1 8E/Q1 55L. The improvement ostensibly resulted from an increase 

20 in the kinetic properties of the conditional heterodimers since the two "evolved'% 

zipperless Neo fragments (Neo fragments wifli mutations but without leucine zippers) 
could not provide kanamycin resistance and the expression level of the "unevol ved" 
heterodimers and the "evolved" heterodimers (both with leucine appers) were very 
similar as detCTmined by a quantitetive ELISA assay using antibodies against Neo. 

25 Variations, modifications, and other implementations of what is described 

herein will occur to those of ordinary skill in the art without departing from the spnit 
and scope of the invention and tfie following claims. 

All patents, patent applications, a publications, referenced herein are 
incorporated in their entirety herein. 

30 What is claimed is: 
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CLAIMS 

A method for assembling a modulatable fusion molecule, comprising: 

randomly inserting an insertion sequence into an acceptor sequence, 

wherein the insertion sequence and the acceptor sequence each comprise a 

state, thereby generating a fusion molecule; and 

selecting a fusion molecule wherein insertion couples the state of the 

insertion sequence to the state of the acceptor sequence. 

The method according to claim 1, wherein the state of the insertion sequence 
is modulated. 

The method according to claim 2, wherein flie state of the insertion sequence 
is modulated in response to a change in flie state of flie acceptor sequence. 

The method according to claim 1, wherein the state of the acceptor sequence is 
modulated. 

The method according to claim 4, wherein the state of the acceptor sequence is 
modulated in response to a change in the state of the insertion sequence. 

The method according to claim 1, wherein the fusion molecule comprises a 
new state. 

A method for assembling a fusion molecule comprising an insertion site, the 
method comprising: 

inserting an insertion sequence into an acceptor sequence, thereby 
generating a fusion molecule, wherein the insertion sequence and the acceptor 
sequence each comprise a state; 

generatmg a duplication, deletipn. or substitution, at the msertion site 
in the acceptor sequence; and; 

selecting a fusion molecule wherein insertion couples the state of the 
insertion sequence to the state of the acceptor sequence. 
The method according to claim 7, wherein the generating step occurs prior to 
the inserting st^. 
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9. The method according to claim 7, wherein the state of the insertion sequence 
is modulated. 

10. The method according to claim 9, wherein state of the insertion sequence is 
modulated in response to a change in the activity of the acceptor sequence. 

5 11. The method according to claim 7, wherein die state of the acceptor sequ^ce is 
modulated. 

12. The method according to claim 1 1» wherein the state of the acceptor sequence 
is modulated in response to a change in the state of the insertion sequence. 

13. The method according to claim 7, wherein the fusion molecule comprises a 
10 new state. 

14. A method for assembling a multistable fusion molecule which can switch 
between at least an active state and a less active state, comprising: 

randomly inserting an insertion sequence into an acceptor sequence, 
thereby generating a fusion molecule, wherein either the insertion sequence or 
15 the acceptor sequence comprises a state; and wherein the respective other 

sequence is responsive to a signal; 

selecting a fusion molecule, wherein the state is coupled to the signal, 
such that the fusion molecule switches state in response to the signal. 

15. A method for assembling a fusion molecule, comprising: 

20 randomly inserting an insertion sequence responsive to a signal into an 

acceptor sequence cornprising a state, thereby genemting a fusion molecule; 
selecting for a fusion molecule wherein the state of the acceptor sequence is 
responsive to the signal. 

16. The method accordmg to any of claims 1, 7, 14, and 1 5, wherein said insertion 
25 sequence and acceptor sequence comprise polypeptides. 

17. The method according to claim 16, wherein said inserting comprises obtaining 
a first nucleic acid firagment encoding said insertion polypeptide and a second 
nucleic acid fragment ^coding said acceptor polypeptide and landomly 
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inserting said first nucleic acid fragment into said second nucleic acid 
fragment 

18. The method according to claim 17, further comprising the step of digesting the 
second nucleic acid fragment with a nuclease. 

5 19. The method according to claim 17, comprising the step of generating random 
fragments of nucleic, acid sequences and inserting a fragment at random into a 
nucleic acid encoding the acceptor sequence. 

20. The method according to claim 17, wherein the step of generating random 
firagments comprises exposing a nucleic acid sequence encoding the acceptor 

10 sequence to a nuclease, mechanically shearing the nucleic acid, exposing the 

nupleic acid to a chemical, and/or exposing the nucleic acid sequence to 
.radiation. 

21. The method according to claim 20, wherein the nuclease is selected from the 
group consisting of one or more of: DNAse I, SI nuclease, mung bean 

1 5 nuclease, and a restriction endonuclease. 

22. The method according to claim 17, further comprising the step of randomly 
inserting first nucleic acid firagments into second nucleic acid fragments, a 
plurality of times sequentially or simultaneously. 

23. The method according to claim i, further comprising providing a library of 
20 acceptor polypeptides comprising randomly inserted insertion polypeptide 

sequences, and selecting fusion polypiq>tides wherein the states of the 
insertion and acceptor polypeptides are coupled. 

24. The method according to claim 22, wherein the step of inserting a plurality of 
times generates a library of nucleic acid molecules expressing fusion 

25 polypeptides comprising acceptor polypeptides which comprise randomly 

inserted insertion polypeptide sequences: 

25. The method according to claim 22, further coniprising selecting fusion 
polypeptides in which the state of the insertion polypeptide sequence is 
coupled to the state of the acceptor polypeptide sequence. 
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A method for modulating a cellular activity, comprising: 

providing a fusion molecule generated according to the method of any 
of claims 1, 7, 14, and 15 to a cell, wherein a change in state of at least the 
insertion sequence or the acceptor sequence modulates a cellular activity, and 
wherein the change in state which modulates the cellular activity is coupled to 
a change in state of the respective other portion of the fusion molecule; and 

changing the state of the respective other portion of the fusion 
molecule, thereby modulating flie cellular activity. 

A method for delivering a bio-effective molecule to a cell, comprising: 

providing a fusion molecule associated wifli a bio^ffective molecule 
to the cell, the fusion molecule comprising an insertion sequence and an 
acceptor sequence, wherein either flie insertion sequence or the acceptor 
sequence binds to a cellular marker of a pathological, condition and wherein 
upon binding to the marker, the fusion molecule dissociates from the bio- 
effective molecule, thereby delivering the molecule to the cell. 

A method for delivering a bio-effective molecule intracellularly, comprising: 
providing a fusion molecule associated with a bio-effective molecule 

to the cell, the fusion molecule comprising an insertion sequence and an 

acceptor sequence, 

wherein either the insertion sequence or acceptor sequence comprises a 

transport sequence for transporting the fusion molecule Mtracellularly, and 

wherein release of flie bio-effective molecule from the fiision molecule 

is coupled to transport of the fusion moleculp intracellularly. 

The method according to claim 28, wherein either the insertion sequence or 
the acceptor sequence is capable of binding to a biomolecule, and wherein 
binding the fusion molecule with the biomolecule locahzes the fusion 
molecule comprising the bio-effective molecule intracellularly and 
disassociates the bio-effective molecule from the fusion molecule. 

A method for modulating a molecular pathway in a cell, comprising: 

. providing a fusion molecule to the cell, the fusion molecule comprising 
an insertion sequence and an acceptor sequence, 
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wherein the activity of the insertion sequence and acceptor sequence 
are coupled, and responsive to a signal, and 

wherein the activity of either the insertion sequence or the acceptor 
sequence modulates the activity or expression of a molecular pathway 
5 molecule in the cell; and . 

exposing the fusion molecule to fte signal. 

A method for controlling the activity of a nucleic acid regulatory sequence, 
comprising: 

providing a fusion molecule, the fusion molecule comprising an 
insertion sequence and an acceptor sequence, 

wherein either the insertion sequence or the acceptor sequence 
responds to a signal, and 

wherein the respective other sequence of the fusion molecule binds to 
the nucleic acid regulatory sequence when the signal is responded to; and 
exposing the fusion molecule to the signal. 

A fusion molecule, comprising: 

an insertion sequence and an acceptor sequence, 
wherein either the insertion sequence or the acceptor sequence 
transports the fusion molecule intracellularly and wherein intracellular 
transport of the fusion molecule is coupled to binding of the fusion molecule 
to a bio-effective molecule. 

A fusion molecule, comprising: 

an insertion sequence and an acceptor sequence, wherein either the 
insertion sequence or the acceptor sequence binds to a nucleic acid molecule, 
and wherein nucleic acid binding activity is coupled to the response of the 
respective other sequence of the fusion molecule to a signal. 

34; A fusion molecule, comprising: 

an insertion sequence and an acceptor sequence, wherein either the 
insertion sequence or the acceptor sequence associates with a bio-efifective 
30 molecule, and disassociates from the bio-effective molecule, when.the 
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respective other sequence of the fusion binds to a cellular marker of a 
paftological condition. 

35. A fusion molecule capable of switching from a non-toxic to a toxic state, 
comprising: 

5 an insertion sequence and an acceptor sequence, wherein either the 

insertion sequence or acceptor sequence binds to a cellular marker of a 
pathology, and wherein binding of the marker to the fusion protein switches 
the fusion protein from a non-toxic state to a toxic state. 

36. A fusion molecule capable of switching from a toxic state to a less toxic state, 
10 comprising: 

an insertion sequence and an acceptor sequence, wherein either the. 
insertion sequence or acceptor sequence binds to a cellular marker of a healtiiy 
cell, and wherein binding of the marker to tiie fusion protein switches the 
fusion protein from a toxic state to a less toxic state. 

15 37. A molecular switch for controlling a cellular pathway, comprising: 

a fusion molecule comprising an insertion sequence and an acceptor 
sequence, 

wherein the state of tiie insertion and acceptor sequence are coupled, 
and responsive to a signal, and 
20 wherein the state of either the insertion sequence or the acceptor 

sequence modulates the activity or expression of a molecular pathway 
molecule in a cell. 

38- A sensor molecule, comprising: 

an insertion sequence and an acceptor sequence, 
25 wherein either the insertion sequence or acceptor sequence binds to a 

target molecule, 

wherein the respective other sequence generates a signal in response to 
binding, and further, 

wherein the acceptor sequence comprises a deletion, duplication, and 
30 or substitution at the insertion site. 



39. A library, comprising a plurality of library members, 
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wherein each library member comprises a first nucleic acid sequence 
encoding a first polypeptide having a first state, the first nucleic acid sequence 
being inserted into a second nucleic acid sequence encoding a second 
polypeptide having a second state, at a random insertion site in the second 
5 nucleic acid sequence, and wherein the library comprises members comprising 

insertions with deletions at the insertion site, insertions with tandem 
duplications at the insertion site, and insertions with neither duplications nor 
deletions. 

40. A library comprising a plurality of library members comprising fiision 
1 0 molecules generated according to any of claims 1 , 7, 14, or 1 5. 

41 . A method for generating a conditional heterodimer, comprising: 

providing a plurality of randomly bisected molecules; 

each bisected molecule comprising a first half and a second half, 
wherein the first and second half are fiised to first and second dimerization 
1 5 domains respectively, and wherein a fimction of the bisected molecule is 

altered by bisection, 

selecting for restoration of fimction of a bisected molecule in response 

to a signal. 

42. A method for modulating a cellular activity comprising: providing a 

20 conditional heterodimer obtained by the mefliod of claim 41 to a cell that lacks 

the fimction of the molecule. 

43. The method according to claim 42, fiirther comprising: exposing the cell to the 
signal. 

44. The method according to claim 43, wherein the signal comprises the presence, 
25 absence or level of a CID molecule. 
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